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(57) A data reception apparatus is provided with a 
data request/reception unit for requesting media data 
corresponding to first and second foreground images 
from servers having the media data, and receiving a 
message from each servers; and a control data gener- 
ation unit for controlling the data request/reception unit 
such that it issues a request message for each media 
data at a time by a predetermined latency time earlier 
than the display start time of each foreground image, on 
the basis of information indicating the latency time be- 
fore starting display of the foreground image, which in- 
formation is included in SMIL data indicating scene de- 
scription. Therefore, each of the first and second fore- 
ground images can be combined with a background im- 
age and displayed at a time designated in the scene de- 
scription. 
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Description 

FIELD OF THE INVENTION 

[0001] The present invention relates to data reception 5 
apparatuses, data reception methods, data transmis- 
sion methods, and data storage media. More particular- 
ly, the invention relates to a transmission process of 
transmitting control data including a storage location 
and a playback start time of media data from a server 
distributing the media data, a reception process of ac- 
cessing the server to receive and play the media data, 
and a data storage medium having a program for mak- 
ing a computer perform the above-mentioned transmis- 
sion process and reception process. 

BACKGROUND OF THE INVENTION 

[0002] In recent years, with the advance of compres- 
sive coding technology for video data and audio data 
and the increase in data transmission capacity of net- 
works such as the Internet and wireless networks, we 
can see services handling data such as video, audio, 
text, and the like, which are called media data. 
[0003] These services have conventionally been dis- 
tributed by a downloading scheme. In the downloading 
scheme, all of media data required for playback are 
downloaded from a server to a client terminal through a 
network and, after completion of the downloading, play- 
back and display of the media data are performed at the 
client terminal. 

[0004] Recently, the services handling the above- 
mentioned media data have come to adopt a streaming 
scheme instead of the downloading scheme. In the 
streaming scheme, reception of media data from a serv- 
er at a client terminal through a network is performed in 
parallel with playback and display of the received media 
data at the client terminal. 

[0005] Since, in the streaming scheme, playback and 
display of the media data are performed before recep- 
tion of the media data is completed, the most striking 
characteristic of the streaming scheme is that a service 
adopting this scheme can reduce the latency time from 
when program data is requested to when playback and 
display of the program data are performed even when 
the service distributes a long-hours program. 
[0006] In the future, services distributing media data 
as described above will go beyond playback and display 
of single media data such as video data or audio data, 
to be extended to services capable of simultaneous 
playback and display of plural pieces of media data, 
such as video data, still-picture data, text data, and the 
like. 

[0007] Hereinafter, a description will be given of a 
process of simultaneously playing plural pieces of me- 
dia data by the streaming scheme to display, for exam- 
ple, one background and two foregrounds at the same 
time. 



[0008] Figure 11(a) is a diagram for explaining the 
spatial arrangement of media data. 
[0009] In figure 11(a), a predetermined image space 
1100 is a rectangle background display region (bg re- 
gion) 1110 where a background image (bg) is displayed. 
In the rectangle background display region 1110, there 
are a first rectangle foreground display region (adv re- 
gion) 1120 where a first foreground image (adv) that is 
a picture of an advertisement or the like is placed, and 
a second rectangle foreground display region (mov re- 
gion) 1130 where a second foreground image (mov) as 
a moving picture is placed. 

[0010] For the predetermined image space 1100, a 
coordinate system indicating the positions in the image 
space 1 1 00 is defined by the number of horizontal points 
corresponding to the number of pixels in the horizontal 
direction and the number of vertical points correspond- 
ing to the number of pixels in the vertical direction. For 
example, the upper left comer of the background display 
region (entire scene) 1110 is in a position where the 
number of horizontal points and the number of vertical 
points are 0. The size of the background display region 
(entire scene) 1110 in the horizontal direction (width) is 
300 points, and the size of the background display re- 
gion 1110 in the vertical direction (height) is 200 points. 
The upper left comer of the first foreground display re- 
gion (adv region) 1 1 20 is in a position where the number 
of horizontal points is 0 and the number of vertical points 
is 150. The size of the first foreground display region 
1 1 20 in the horizontal direction (width) is 300 points, and 
the size of the first f oregrou nd display region 1 1 20 in the 
vertical direction (height) is 50 points. The upper left cor- 
ner of the second foreground display region (mov re- 
gion) 1130 is in a position where the number of horizon- 
tal points is 50 and the number of vertical points is 0. 
The size of the second foreground display region 1130 
in the horizontal direction (width) is 200 points, and the 
size of the second foreground display region 1 1 30 in the 
vertical direction (height) is 1 50 points. 
[0011] Figure 11(b) is a diagram for explaining the 
temporal arrangement of the media data, showing the 
timings when the background image and the first and 
second foreground images are displayed in the prede- 
termined image space. 

[001 2] In the temporal arrangement of the media data 
shown in figure 11(b), when a reference time T of the 
client terminal becomes a display start time Tbg 
(Tbg=0sec.) of the background image, the background 
image (bg) appears in the image space 1100. Further, 
when the reference time T of the client terminal be- 
comes a display start time Tadv (Tadv=5sec.) of the first 
foreground image (adv), the first foreground image (adv) 
appears in the image space 1 1 00. Further, when the ref- 
erence time T of the client terminal becomes a display 
start time Tmov (Tmov=1 Osec.) of the second fore- 
ground image (mov), the second foreground image 
(mov) appears in the image space 1100. 
[0013] In order to actually perform the process of si- 
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multaneously playing the plural pieces of media data for 
display at the client terminal, information (scene de- 
scription data) for combining the respective media data 
(i.e., the background image (bg), the first foreground im- 
age (adv), and the second foreground image (mov)) is 5 
required. The scene description data designates the 
temporal arrangement (refer to figure 11(b)) and the 
spatial arrangement (refer to figure 11 (a)) of the respec- 
tive media data. Further, there is scene description data 
whose contents are described with a language stand- 
ardized by W3C (World Wide Web Consortium), such 
as "SMIL (Synchronized Multimedia Integration Lan- 
guage)" or "HTML (Hyper Text Markup Language) + 
TIME (Timed interactive Multimedia Extensions)". 
[0014] Hereinafter, a description will be given of the 
SMIL as one of the languages expressing the scene de- 
scription data. 

[001 5] Figure 1 2 is a diagram for explaining an exam- 
ple of contents of scene description data according to 
the SMIL 

[001 6] In figure 1 2, character strings described at the 
heads of the respective rows of the scene description 
SD, i.e., <smil>, </smil>, <head>, </head>, <layout>, <J 
!ayout>, <root-layout>, <region>, <body>, <par>, <J 
par>, <video>, and the like are called "elements", and 
declare the contents of descriptions which follow the el- 
ements. 

[0017] For example, the smil element and the/smil el- 
ement declare that the rows positioned between the row 
710a including the smil element and the row 710b in- 
cluding the /smil element are described according to the 
SMIL. The head element and the /head element declare 
that the rows positioned between the row 720a including 
the head element and the row 720b including the /head 
element describe information for defining the regions 
where the respective images (bg), (adv), and (mov) are 
placed in the image space shown in figure 11(a). Fur- 
ther, the layout element and the /layout element declare 
that the rows 701 to 703 including information relating 
to the positions of the background image and the fore- 
ground images to be played in parallel with each other 
(at the same time) are placed between the row 730a in- 
cluding the layout element and the row 730b including 
the /layout element. 

[001 8] Furthermore, the root-layout element 701 a de- 
clares that the description in the row 701 including this 
element designates the image to be displayed as the 
background image (entire scene) and designates the 
size of the background image. The region element 702a 
(703a) declares that the description in the row 702 (703) 
including this element designates the size of one rec- 
tangle region where the foreground image is placed, and 
the position of the rectangle region in the entire scene 
(image space). 

[0019] The body element and the /body element de- 
clare that the rows positioned between the row 740a in- 
cluding the body element and the row 740b including 
the /body element describe information (URL) indicating 



the location of the media data to be played and informa- 
tion relating to the time when the media data is to be 
displayed. Further, the par element and the /par element 
declare that the rows 704 and 705 including media ele- 
ments and attribute information relating to the media da- 
ta to be played in parallel with each other (at the same 
time) are grouped and placed between the row 750a in- 
cluding the par element and the row 750b including the 
/par element. 

[0020] Each of the video elements 704a and 705a de- 
clares that the description in the row including this ele- 
ment designates video data. 

[0021] Furthermore, character strings "id", "width", 
"height", "left", "top", "src", "begin", and the like which 
follow the above-mentioned root-layout element, region 
element, and video element are called "attributes", and 
designate detailed information in the rows including the 
respective elements. 

[0022] To be specific, the id attributes in the rows 701 , 
702, and 703 including the root-layout element, the re- 
gion element, and the region element designate the me- 
dia data, i.e., the background image, the first foreground 
image, and the second foreground image, respectively. 
[0023] Further, the width attribute and the height at- 
tribute in the row 701 including the root-layout element 
701a designate the width and the height of the back- 
ground image (entire scene), and the size of the back- 
ground (entire scene) is designated such that the width 
is 300 points (width= ,, 300") and the height is 200 points 
(height="200 tt ). 

[0024] Further, the width attribute and the height at- 
tribute in the row 702 (703) including the region element 
702a (703a) designate the height and the width of the 
corresponding rectangle region, and the left attribute 
and the top attribute designate the position of the upper 
left comer of the rectangle region with respect to the up- 
per left comer of the entire scene. 
[0025] For example, in the row 702 including the re- 
gion element, the id attribute (id=adv) designates the 
first rectangle region 1120 (refer to figure 10(a)) where 
the media data corresponding to the region attribute val- 
ue (region=adv) is displayed. The position of the upper 
left comer of this first rectangle region is designated by 
the left attribute (left=0) and the top attribute (top=1 50), 
that is, it is set at a distance of 0 point in the horizontal 
direction and 1 50 points in the vertical direction from the 
upper left corner of the image space as a reference 
point. Further, the size of this first rectangle region is 
designated by the width attribute (width=300) and the 
height attribute (height=50), that is, the first rectangle 
region is 300 points wide and 50 points long. 
[0026] In the row 703 including the region element, 
the id attribute (id=mov) designates the second rectan- 
gle region 1130 (refer to figure 11(a)) where the media 
data corresponding to the region attribute value (re- 
gion=mov) is displayed. The position of the upper left 
comer of this second rectangle region is designated by 
the left attribute (left=50) and the top attribute (top=0), 
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that is, it is set at a distance of 50 points in the horizontal 
direction and 0 point in the vertical direction from the 
upper left corner of the image space as a reference 
point. Further, the size of this second rectangle region 
is designated by the width attribute (width=200) and the 
height attribute (height=150), that is, the second rectan- 
gle region is 200 points wide and 150 points long. 
[0027] The arrangement information described in the 
row 702 including the region element is adapted to the 
media data which is designated by the region attribute 
value (region=adv) in the row 704 including the video 
element, and the arrangement information described in 
the row 703 including the region element is adapted to 
the media data which is designated by the region at- 
tribute value (region=mov) in the row 705 including the 
video element. 

[0028] Further, the src attribute in the row 704 (705) 
including the video element 704a (705a) designates the 
transmission scheme and the storage location of the 
media data on the server. The information designated 
by the src attribute is required to request media data 
from the server because the SMIL data is not provided 
with the media data such as video. 
[0029] In the row 704 (705) including the video ele- 
ment, rtsp (real time streaming protocol), which is a pro- 
tocol (procedure) for exchanging a data request mes- 
sage between the transmitting end and the receiving 
end, is designated as a transmission scheme. In the row 

704 including the video element, data (adv.mpg) stored 
in a server (s2.com) is designated as media data corre- 
sponding to the first foreground image (adv). In the row 

705 including the video element, data (mov.mpg) stored 
in a server (s3.com) is designated as media data corre- 
sponding to the second foreground image (mov). 
[0030] Therefore, at the client terminal, messages re- 
questing the media data (adv.mpg) and the media data 
(mov.mpeg) are issued to the server (s2.com) and the 
server (s3.com) designated by the descriptions in the 
rows 704 and 705 including the video elements, respec- 
tively, by using the RTSP (Real Time Streaming Proto- 
col) which is the media data transmission protocol (pro- 
cedure). The media data are transmitted and received 
by using the RTP (Realtime Transport Protocol). 
[0031 ] Furthermore, the begin attribute in the row 704 
(705) including the video element designates the time 
to start display of the media data in the case where the 
time to start display of the scene is a starting point (0 
sec.). The temporal arrangement of each media data 
depends on the begin attribute and the like. In the de- 
scription in the row 704 including the video element, the 
begin attribute is set at 5 sec. (being^S"). That is, the 
temporal arrangement of the first foreground image is 
designated such that display of this image will be started 
five seconds after display of the scene is started. In the 
description in the row 705 including the video element, 
the begin attribute is set at 10 sec. (begin= n 10**). That 
is, the temporal arrangement of the second foreground 
image is designated such that display of this image will 



be started ten seconds after display of the scene is start- 
ed. 

[0032] Next, a description will be given of a conven- 
tional data reception apparatus mounted on a personal 
5 computer as an example of the above-mentioned client 
terminal. 

[0033] Figure 1 3 is a block diagram for explaining the 
data reception apparatus. 

[0034] The data reception apparatus 901 obtains, as 

10 scene description data, SMIL data shown in figure 11 
from the server, and obtains media data designated by 
the SMIL data from the server, and performs playback 
and display of the obtained media data. 
[0035] To be specific, the data reception apparatus 

15 901 includes a plurality of data reception units 902a and 
902b for receiving image data (media data) Dm1 and 
Dm2 corresponding to the respective images constitut- 
ing a scene, and outputting these image data; a plurality 
of image decoding units 903a and 903b for decoding the 

20 image data Dm1 and Dm2 outputted from the respective 
data reception units 902a and 902b to output decoded 
image data Dd1 and Dd2; a plurality of frame memories 
904a and 904b for storing, in units of frames, the decod- 
ed image data Dd1 and Dd2 supplied from the respec- 

25 tive image decoding units 903a and 903b; and a display 
unit 905 for receiving the decoded image data Dd1 and 
Dd2 read from the respective frame memories 904a and 
904b, and combining the image data corresponding to 
the respective images to construct one scene, on the 

30 basis of control data Del , and displaying the scene. 
[0036] The data reception unit 901 further includes an 
SMIL request/reception unit 906 for outputting an SMIL 
request signal Srd to request SMIL data Ds from a pre- 
determined remote server on the basis of the third con- 

35 trol data Dc3, and receiving the SMIL data Ds from the 
remote server to analyze it; a control data generation 
unit 907 for receiving SMIL analysis data Da obtained 
by the analysis on the SMIL data, and storing, as first 
control data Del , information relating to spatial arrange- 

40 ment and temporal arrangement of each image corre- 
sponding to each video element, and storing, as second 
control data Dc2, information relating to a transmission 
scheme and a storage place for the image data (media 
data) corresponding to each image; a data request/re- 

45 ception unit 908 for outputting a data request signal Sr 
to request image data from the remote server on the ba- 
sis of the control data Dc2 from the data generation unit 
907, receiving an acknowledge signal Sack to the re- 
quest, and outputting data Sm obtained from the ac- 

50 knowledge signal, to the data generation unit 907; and 
a clock circuit 909 for providing the respective compo- 
nents of the data reception apparatus 901 with time in- 
formation. 

[0037] The data reception apparatus 901 possesses 
55 the data reception units, the image decoding units, and 
the frame memories as many as the number of image 
data (media data) to be received. The data request/re- 
ception unit 908 requests scene description data for 
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playing a predetermined scene, according to user oper- 
ation. 

[0038] Hereinafter, the operation of the data reception 
apparatus 901 will be described. 

[0039] Figure 14 is a diagram for explaining the flow 
of a procedure by which the data reception apparatus 
901 obtains media data from the server, illustrating an 
example of RTSP (Realtime Transport Streaming Pro- 
tocol). 

[0040] It is assumed that the data reception apparatus 
901 is mounted on a personal computer as a client ter- 
minal, and the data reception apparatus 901 is supplied 
with the SMIL data shown in figure 1 2 as scene descrip- 
tion data SD. 

[0041] When the user, who is viewing a home page 
described by HTML (Hyper Text Markup Language) us- 
ing a Web browser installed on the personal computer, 
clicks a region on the home page linked to predeter- 
mined SMIL data Ds, the data reception apparatus 901 
of the client terminal issues an SMIL request command 
(GET http://sl.com/scene.smil) C1 for requesting the 
SMIL data Ds. This command C1 requests the server 
(sl.com) 13a to distribute the SMIL data by HTTP. 
[0042] On receipt of the SMIL request command C1 , 
the server 13a issues an acknowledge (HTTP/1 .0 OK) 
indicating that the command has been accepted, to the 
client terminal, and transmits the SMIL data (scene.sml) 
Ds to the client terminal. 

[0043] In the data reception apparatus 901 of the cli- 
ent terminal, the SMIL request/reception unit 906 re- 
ceives the SMIL data Ds, and analyzes the SMIL data 
Ds. 

[0044] The SMIL analysis data Da obtained by the 
analysis on the SMIL data is stored in the control data 
generation unit 907. 

[0045] That is, the control data generation unit 907 
holds information relating to the size of the background 
image (entire scene) described as the root-layout ele- 
ment, or information relating to the src attribute, top at- 
tribute, left attribute, width attribute, height attribute, and 
begin attribute, described as the video element. To be 
specific, the src attribute information includes informa- 
tion indicating the storage place of each image data, and 
each of the top attribute information and the left attribute 
information includes information about the position of 
the rectangle region where the foreground image is 
placed, with the upper left edge of the scene as a refer- 
ence point. The width attribute information and the 
height attribute information include information about 
the size (width and height) of the rectangle region in the 
horizontal direction and the vertical direction, respec- 
tively. The begin attribute information includes a display 
start time to start display of the media data correspond- 
ing to each video element. 

[0046] The display unit 905 starts the process of cre- 
ating a scene and displaying it, on the basis of the con- 
tents stored in the control data generation unit 907. To 
be specific, the background image (bg) corresponding 



to the root-layout element is displayed over the image 
space 1100 upon starting the display process. At this 
time, the time information outputted from the clock cir- 
cuit 909 is set at zero. 

5 [0047] Since, in the SMIL data Ds, the display start 
time of the first foreground image (adv) is set at five sec- 
onds and the display start time of the second foreground 
image (mov) is set at ten seconds, the display unit 905 
does not perform the process of combining the image 

10 data with reference to the frame memories 904a and 
904b, during the period from 0 second to five seconds. 
[0048] When the time information outputted from the 
clock circuit 909 becomes 5 seconds, exchange of a 
message requesting the image data (adv.mpg) corre- 

15 sponding to the first foreground image is performed be- 
tween the data request/reception unit 908 and the sec- 
ond server (s2.com) 1 3b, on the basis of the src attribute 
of the video element 704 stored in the control data gen- 
eration unit 907, by using RTSP (Real Time Streaming 

20 Protocol) as a communication protocol. Thereafter, the 
server transmits the image data (adv.mpg) using RTP 
(Realtime Transport Protocol). 

[0049] To be specific, as shown in figure 14, the data 
reception apparatus 901 of the client terminal issues a 

25 command (DESCRIBE itsp://s2.com/adv.mpg) C2 re- 
questing specific information relating to the media data 
corresponding to the first foreground image (adv) (e.g., 
coding condition, existence of plural candidate data, 
etc.), to the second server (s2.com) 13b. 

30 [0050] On receipt of the command C2, the second 
server 13b issues an acknowledge (RTSP/1.0 OK) R2 
indicating that the command has been accepted, to the 
client terminal, and transmits SDP (Session Description 
Protocol) information to the client terminal. 

35 [0051] Next, the data reception apparatus 901 of the 
client terminal issues a setup request command (SET- 
UP rtsp7/s2.com/ adv.mpg) C3 which requests the sec- 
ond server (s2.com) 1 3b to set up provision of the media 
data corresponding to the first foreground image (adv), 

40 to the second server 13b. Upon completion of setup for 
the media data, the second server 13b issues an ac- 
knowledge (RTSP/1 .0 OK) R3 indicating that the com- 
mand C3 has been accepted, to the client terminal. 
[0052] When the data reception apparatus 901 of the 

45 client terminal issues a data request command (PLAY 
rtsp://s2.com/adv.mpg) C4 requesting the media data 
corresponding to the first foreground image (adv), to the 
second server (s2.com) 13b, the second server 13b is- 
sues an acknowledge (RTSP/1 .0 OK) R4 indicating that 

50 the command C4 has been accepted, to the client ter- 
minal. Thereafter, the second server 13b stores the me- 
dia data Dm1 corresponding to the first foreground im- 
age (adv.mpg) in RTP packets, and successively trans- 
mits the RTP packets to the client terminal. 

55 [0053] The media data Dm1 is received by the corre- 
sponding data reception unit 902a to be output to the 
corresponding image decoding unit 903a. The image 
decoding unit 903a decodes the media data, and the 
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decoded media data Dd1 is stored in the corresponding 
frame memory 904a in units of frames. At this point of 
time, playback of the media data Dm1 becomes possi- 
ble. However, three seconds have passed from when 
the client terminal started the request for the media data 
Dm1 from the server (i.e. , when the output of the counter 
was five seconds) to when the client terminal and the 
server exchange the message. 

[0054] In this way, since the client terminal exchanges 
the message with the server to obtain the media data 
from the server, the time when playback of the first fore- 
ground image at the client end becomes possible is be- 
hind the display start time of the first foreground image 
described in the SMIL data. 

[0055] Therefore, in the display unit 905, the first fore- 
ground image is displayed when three seconds have 
passed from the display start time of the first foreground 
image described in the SMIL data. 
[0056] That is, when the time information from the 
clock circuit 909 reaches 8 seconds, it is judged whether 
one frame of decoded image data of the foreground im- 
age is stored in the frame memory 904a or not. When 
one frame of decoded image data is stored, the first fore- 
ground image is superimposed on the background im- 
age for display. 

[0057] When the image data are video data, the im- 
age data are successively input to the data reception 
unit 902a and successively decoded by the image de- 
coding unit 903a, and the decoded image data are suc- 
cessively stored in the frame memory 904a in units of 
frames. In the display unit 905, the image data corre- 
sponding to the respective frames, which are stored in 
the frame memory 904a, are successively combined 
with the data of the background image for display. 
[0058] When the time information outputted from the 
clock circuit 909 reaches 10 seconds, exchange of a 
message requesting the image data (mov.mpg) corre- 
sponding to the second foreground image is performed 
between the data request/reception unit 908 and the 
third server (s3.com) 13c, on the basis of the src at- 
tribute of the video element 705a stored in the control 
data generation unit 907, by using RTSP (Real Time 
Streaming Protocol) as a communication protocol. 
Thereafter, the server transmits the image data (mov. 
mpg) by using RTP (Realtime Transport Protocol). 
[0059] To be specific, as shown in figure 1 4, the data 
reception apparatus 901 of the client terminal issues a 
command (DESCRIBE rtsp://s3.com/mov.mpg) C5 re- 
questing specific information relating to the media data 
corresponding to the second foreground image (mov) 
(e.g., coding condition, existence of plural candidate da- 
ta, etc.), to the third server (s3.com) 13c. 
[0060] On receipt of the command C5, the third server 
13c issues an acknowledge (RTSP/1 .0 OK) R5 indicat- 
ing that the command has been accepted, to the client 
terminal, and transmits SDP (Session Description Pro- 
tocol) information to the client terminal. 
[0061] Next, the data reception apparatus 901 of the 



client terminal issues a setup request command (SET- 
UP rtsp://s3.com/ mov.mpg) C6 requesting the third 
server (s3.com) 1 3c to set up provision of the media data 
(image data) corresponding to the second foreground 

5 image (mov). Upon completion of setup for the media 
data, the third server 13c issues an acknowledge (RT- 
SP/1 .0 OK) R3 indicating that the command C6 has 
been accepted, to the client terminal. 
[0062] When the data reception apparatus 901 of the 

10 client terminal issues a data request command (PLAY 
rtsp://s3.com/adv.mpg) C7 requesting the media data 
corresponding to the second foreground image (mov), 
to the third server (s3.com) 13c, the third server 13c is- 
sues an acknowledge (RTSP/1 .0 OK) R7 indicating that 

15 the command C7 has been accepted, to the client ter- 
minal. Thereafter, the third server 1 3c stores the media 
data Dm2 corresponding to the second foreground im- 
age (mov.mpg) in RTP packets, and successively trans- 
mits the RTP packets to the client terminal. 

20 [0063] The media data Dm2 is received by the corre- 
sponding data reception unit 902b to be output to the 
corresponding image decoding unit 903b. The image 
decoding unit 903 decodes the media data Dm2 in like 
manner as the decoding process for the media data 

25 Dm1 , and the decoded media data Dd is stored in the 
corresponding frame memory 904b in units of frames. 
At this point of time, playback of the media data Dm2 
becomes possible. However, a predetermined time has 
passed from when the client terminal started the request 

30 for the media data Dm2 from the server (i.e., when the 
output of the counter was ten seconds) to when the cli- 
ent terminal and the server exchange the message. 
[0064] In this way, since the client terminal exchanges 
the message with the server to obtain the media data 

35 from the server, the time when playback of the second 
foreground image at the client end becomes possible is 
behind the display start time of the second foreground 
image described in the SMIL data. 
[0065] Therefore, in the display unit 905, the second 

40 foreground image is displayed when three seconds 
have passed from the display start time of the second 
foreground image described in the SMIL data. 
[0066] At this time, in the display unit 905, the back- 
ground image (bg) is displayed in the background dis- 

45 play region 1 1 1 0 in the image space 1 1 00 (refer to figure 
11(a)), the first foreground image (adv) is displayed in 
the first foreground display region 1 1 20, and the second 
foreground image (mov) is displayed in the second fore- 
ground display region 1130. That is, a composite image 

50 comprising the background image (bg) and the first and 
second foreground images (adv and mov) is displayed 
in the image space 1100. 

[0067] However, the conventional data reception ap- 
paratus 901 , which issues the image data request mes- 
55 sage to the server on the basis of the contents of the 
SMIL scene description, has the following drawbacks. 
[0068] In the scene description, the begin attribute at- 
tached to the first video element 704a indicates that the 
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start time of the process to display the first foreground 
image (adv) is when five seconds have passed from the 
display start time of the entire scene. Further, the begin 
attribute attached to the second video element 705a in- 
dicates that the start time of the process to display the 
second foreground image (mov) is when ten seconds 
have passed from the display start time of the entire 
scene. Therefore, in the conventional data reception ap- 
paratus 901 mounted on the client terminal (reception 
terminal), the data request message requesting the im- 
age data corresponding to the first foreground image is 
issued to the second server 1 3b when five seconds have 
passed after the display start time of the entire scene, 
and the data request message requesting the image da- 
ta corresponding to the second foreground image is is- 
sued to the third server 13c when ten seconds have 
passed after the display start time of the entire scene. 
[0069] In this case, there is a delay from when the cli- 
ent terminal requests the image data from the server to 
when the image data from the server becomes display- 
able at the client terminal. For example, this delay cor- 
responds to the time required for the message ex- 
change by RTSP between the server and the client ter- 
minal, or the time required for handling the command 
from the client terminal at the server. 
[0070] So, in the conventional data reception appara- 
tus 901, when a predetermined latency time (in this 
case, three seconds) has passed from the start time of 
data request to the server, image display is performed 
on the basis of the image data stored in the frame mem- 
ory. 

[0071] As the result, in the data reception apparatus 
901 , it is difficult to display the media data corresponding 
to each video element at the time designated by the 
scene description, i.e., at the time indicated by the begin 
attribute included in the video element. 
[0072] Further, the time required from the request for 
the image data to the storage of the image data in the 
frame memory depends on the network condition, the 
number of messages to exchange, and the like. There- 
by, the temporal relationship in positions between plural 
image data varies, resulting in difficulty in maintaining 
synchronization between the plural image data. 
[0073] For example, according to the scene descrip- 
tion Sd shown in figure 12, display of the image corre- 
sponding to the second video element 705 should be 
started five seconds after display of the first video ele- 
ment 704 is started. However, when the time from when 
the data reception apparatus 901 requests the image 
data from the server to when the image data is actually 
stored in the frame memory of the apparatus 901 varies 
due to various factors such as congestion of the net- 
work, there is the possibility that the image correspond- 
ing to the video element 705 is not displayed after five 
seconds from when display of the image corresponding 
to the video element 704 is started. This situation will be 
a serious problem when the scene is a composite image 
comprising plural image data relating with each other. 



[0074] Furthermore, when the media data is transmit- 
ted through a network for which a band width (i.e. , a con- 
stant data transmission rate) is not assured like the In- 
ternet, the image decoding unit of the data reception ap- 

5 paratus 901 must wart several seconds ~ ten and sev- 
eral seconds until a predetermined quantity of received 
image data is stored in the data buffer, before starting 
decoding on the received image data. The process of 
storing a predetermined quantity of received image data 

10 »n the data buffer of the data reception unit until decoding 
on the image data is started by the image decoding unit 
is called "prebuffering". 

[0075] When the prebuffering is not performed, the 
decoding process in the data reception apparatus is 

is easily affected by jitter in the network (fluctuations in 
transmission rate). For example, when decoding is per- 
formed for every predetermined quantity of image data, 
image data to be decoded are not stored by the time to 
perform decoding, resulting in the state where playback 

20 of the image data is interrupted. 

[0076] Accordingly, when the time required for ex- 
change of messages with the server or prebuffering is 
considered, the conventional data reception apparatus 
901 , which issues a message requesting each image 

25 data to the server at the display time of the image data 
described in the SMIL data, cannot perform normal 
scene playback according to the scene description. 
[0077] Moreover, an appropriate prebuffering time 
varies for every bit stream corresponding to each image 

30 data (coded data of each image data). Therefore, the 
reception terminal (data reception apparatus) cannot 
set an appropriate prebuffering time, resulting in the 
possibility that excess or deficiency of image data in the 
buffer of the data reception unit (i.e., overflow or under- 

35 flow of the buffer) may occur during decoding on the im- 
age data. 

SUMMARY OF THE INVENTION 

40 [0078] The present invention is made to solve the 
above-mentioned problems and has for its object to pro- 
vide a data reception apparatus, a data reception meth- 
od, and a data transmission method, by which playback 
and display of plural images constituting a scene can be 

^5 started on times designated by scene description data, 
and playback and display of image data can be per- 
formed without interruption regardless of jitter in the net- 
work. 

[0079] It is another object of the present invention to 
50 provide a data storage medium containing a program 
for making a computer perform data reception according 
to the above-mentioned data reception method. 
[0080] Other objects and advantages of the invention 
will become apparent from the detailed description that 
55 follows. The detailed description and specific embodi- 
ments described are provided only for illustration since 
various additions and modifications within the scope of 
the invention will be apparent to those of skill in the art 
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from the detailed description. 

[0081 ] According to a first aspect of the present inven- 
tion, there is provided a data reception apparatus for ob- 
taining media data which is any of video data, audio da- 
ta, and text data, and corresponds to plural elements 
constituting a scene, from data sources on a network, 
and playing the obtained media data to display the 
scene. This apparatus comprises a first reception unit 
for receiving location information indicating the locations 
of the data sources having the respective media data 
on the network, first time information indicating the play- 
back start times of the respective media data, and sec- 
ond time information for requesting the respective me- 
dia data from the corresponding data source; a time set- 
ting unit for setting a data request time to make a request 
for each media data to the corresponding data source, 
at a time by a specific time set for each media data ear- 
lier than the playback start time of the media data, on 
the basis of the first and second time information; a data 
request unit for making a request for each media data 
to the data source indicating by the location information, 
at the data request time set by the time setting unit; and 
a second reception unit for receiving the media data 
supplied from the data source according to the request 
from the data request unit. Therefore, playback of each 
media data can be started on time as designated at the 
transmitting end. 

[0082] According to a second aspect of the present 
invention, in the data reception apparatus of the first as- 
pect, the first reception unit receives, as the second time 
information, time information indicating a latency time 
from when each media data is received to when the me- 
dia data is played; and the time setting unit sets the data 
request time for each media data, at a time by the laten- 
cy time earlier than the playback start time of the media 
data. Therefore, the data reception apparatus can ob- 
tain each media data from a predetermined data source 
on the network, within the latency time before playback 
of the media data. Furthermore, by setting the latency 
time at a sufficiently large value considering the condi- 
tion of the network through which the media data is 
transmitted (e.g., band width, congestion, etc.), play- 
back of the media data by the data reception apparatus 
is hardly affected by jitter in the network, thereby pre- 
venting the image display from being interrupted during 
playback of the media data. 

[0083] According to a third aspect of the present in- 
vention, in the data reception apparatus of the first as- 
pect, the first reception unit receives, as the second time 
information, time information indicating a time to make 
a request for each media data to the corresponding data 
source; and the time setting unit sets the data request 
time for each media data, at the time indicated by the 
second time information. Therefore, the data reception 
apparatus can obtain each media data from a predeter- 
mined data source on the network, within the time from 
the data request time to the playback start time. Further- 
more, by setting the data request time at a time suffi- 



ciently earlier than the data playback start time consid- 
ering the condition of the network through which the me- 
dia data is transmitted (e.g., band width, congestion, 
etc.), playback of the media data by the data reception 

5 apparatus is hardly affected by jitter in the network, 
thereby preventing the image display from being inter- 
rupted during playback of the media data. 
[0084] According to a fourth aspect of the present in- 
vention, in the data reception apparatus of the first as- 

io pect, the first reception unit receives, as the second time 
information, time information indicating a latency time 
from when each media data is received to when the me- 
dia data is played; and the time setting unit sets the data 
request time for each media data, at a time by the sum 

15 of the latency time and a predetermined time earlier than 
the playback start time of the media data. 
Therefore, playback of each media data can be started 
on time as designated at the transmitting end. Further, 
playback of media data at the receiving end is hardly 

20 affected by jitter in the network, thereby preventing im- 
age display from being interrupted during playback of 
the media data. 

[0085] According to a fifth aspect of the present inven- 
tion, in the data reception apparatus of the first aspect, 

25 the first reception unit receives, as the second time in- 
formation, time information indicating a time to make a 
request for each media data to the corresponding data 
source; and the time setting unit sets the data request 
time for each media data, at a time by a predetermined 

30 time earlier than the time indicated by the second time 
information. Therefore, playback of each media data 
can be started on time as designated at the transmitting 
end. Further, playback of media data at the receiving 
end is hardly affected by jitter in the network, thereby 

35 preventing image display from being interrupted during 
playback of the media data. 

[0086] According to a sixth aspect of the present in- 
vention, there is provided a data reception method for 
obtaining media data which is any of video data, audio 

40 data, and text data, and corresponds to plural elements 
constituting a scene, from data sources on a network, 
and playing the obtained media data to display the 
scene. This method comprises a first reception step of 
receiving location information indicating the locations of 

45 the data sources having the respective media data on 
the network, first time information indicating the play- 
back start times of the respective media data, and sec- 
ond time information for requesting the respective me- 
dia data from the corresponding data sources; a data 

50 request step of making a request for each media data 
to the data source indicating by the location information, 
at a time by a specific time set for each media data ear- 
lier than the playback start time of the media data, on 
the basis of the first and second time information; and 

55 a second reception step of receiving the media data sup- 
plied from the data source according to the request 
made in the data request step. Therefore, playback of 
media data corresponding to each of elements consti- 
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tuting a scene can be started on time as designated at 
the transmitting end. 

[0087] According to a seventh aspect of the present 
invention, in the data reception method of the sixth as- 
pect, the first reception step receives, as the second 
time information, time information indicating a latency 
time from when each media data is received to when 
the media data is played; and the data request step 
makes a request for each media data to a predeter- 
mined data source, at a time by the latency time earlier 
than the playback start time of the media data. There- 
fore, the receiving end can obtain each media data from 
a predetermined data source on the network, within the 
latency time before playback of the media data. Further- 
more, by setting the latency time at a sufficiently large 
value considering the condition of the network through 
which the media data is transmitted (e.g., band width, 
congestion, etc.), playback of the media data at the re- 
ceiving end is hardly affected by jitter in the network, 
thereby preventing the image display from being inter- 
rupted during playback of the media data. 
[0088] According to an eighth aspect of the present 
invention, in the data reception method of the sixth as- 
pect, the first reception step receives, as the second 
time information, time information indicating a data re- 
quest time to make a request for each media data to the 
corresponding data source; and the data request step 
makes a request for each media data to the data source, 
at the data request time. Therefore, the receiving end 
can obtain each media data from a predetermined data 
source on the network, within the time from the data re- 
quest time to the playback start time. Furthermore, by 
setting the data request time at a time sufficiently earlier 
than the data playback start time considering the condi- 
tion of the network through which the media data is 
transmitted (e.g., band width, congestion, etc.), play- 
back of media data at the receiving end is hardly affect- 
ed by jitter in the network, thereby preventing the image 
display from being interrupted during playback of the 
media data. 

[0089] According to a ninth aspect of the present in- 
vention, in the data reception method of the sixth aspect, 
the first reception step receives, as the second time in- 
formation, time information indicating a latency time 
from when each media data is received to when the me- 
dia data is played; and the data request step makes a 
request for each media data to a predetermined data 
source, at a time by the sum of the latency time and a 
predetermined time earlier than the playback start time 
of the media data. Therefore, playback of each media 
data can be started on time as designated at the trans- 
mitting end. Further, playback of media data at the re- 
ceiving end is hardly affected by jitter in the network, 
thereby preventing image display from being interrupted 
during playback of the media data. 
[0090] According to a tenth aspect of the present in- 
vention, in the data reception method of the sixth aspect, 
the first reception step receives, as the second time in- 



formation, time information indicating a data request 
time to make a request for each media data to the cor- 
responding data source; and the data request step 
makes a request for each media data to the data source, 

5 at a time by a predetermined time earlier than the data 
request time. Therefore, playback of each media data 
can be started on time as designated at the transmitting 
end. Further, playback of media data at the receiving 
end Is hardly affected by jitter in the network, thereby 

10 preventing image display from being interrupted during 
playback of the media data. 

[0091 ] According to an eleventh aspect of the present 
invention, there is provided a data transmission method 
for transmitting media data which is any of video data, 

15 audio data, and text data and corresponds to plural el- 
ements constituting a scene, to a reception terminal for 
playing the media data to display the scene. This meth- 
od comprises a first transmission step of transmitting lo- 
cation information indicating the locations of data sourc- 

20 es having the respective media data on a network, first 
time information indicating the playback start times of 
the respective media data, and second time information 
for requesting the respective media data from the cor- 
responding data sources; and a second transmission 

25 step of transmitting the media data to the reception ter- 
minal, according to the request for the media data which 
is issued from the reception terminal on the basis of the 
first and second time information and the location infor- 
mation. Therefore, the receiving end can obtain each 

30 media data from a predetermined data source on the 
network, on the basis of the second time information, 
before playback of the media data, to start playback of 
the media data on time as designated at the transmitting 
end. 

35 [0092] According to a twelfth aspect of the present in- 
vention, in the data transmission method of the eleventh 
aspect, the second time information is time information 
indicating a latency time from when each media data is 
received to when the media data is played. Therefore, 

40 the receiving end can obtain each media data from a 
predetermined data source on the network, within the 
latency time before playback of the media data. Further- 
more, by setting the latency time at a sufficiently large 
value considering the condition of the network through 

45 which the media data is transmitted (e.g., band width, 
congestion; etc.), playback of the media data at the re- 
ceiving end is hardly affected by jitter in the network, 
thereby preventing the image display from being inter- 
rupted during playback of the media data. 

so [0093] According to a thirteenth aspect of the present 
invention, in the data transmission method of the elev- 
enth aspect, the second time information is time infor- 
mation indicating a data request time to make a request 
for each media data to the corresponding data source. 

55 Therefore, the receiving end can obtain each media da- 
ta from a predetermined data source on the network, 
within the time from the data request time to the play- 
back start time. Furthermore, by setting the data request 
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time at a time sufficiently earlier than the data playback 
start time considering the condition of the network 
through which the media data is transmitted (e.g., band 
width, congestion, etc.), playback of media data at the 
receiving end is hardly affected by jitter in the network, 5 
thereby preventing the image display from being inter- 
rupted during playback of the media data. 
[0094) According to a fourteenth aspect of the present 
invention, there is provided a data storage medium con- 
taining a data playback program to make a computer 10 
perform a data playback process of obtaining media da- 
ta which is any of video data, audio data, and text data, 
and corresponds to plural elements constituting a 
scene, from data sources on a network, and playing the 
obtained media data to display the scene. This data 15 
playback program comprises a first program to make the 
computer perform a first process of receiving location 
information indicating the locations of the data sources 
having the respective media data, first time information 
indicating th e playback start times of the respective me- 20 
dia data, and second time information for requesting the 
respective media data from the corresponding data 
sources; a second program to make the computer per- 
form a second process of making a request for each me- 
dia data to the data source indicating by the location in- 25 
formation, at a time by a specific time set for each media 
data earlier than the playback start time of the media 
data, on the basis of the first and second time informa- 
tion; and a third program to make the computer perform 
a third process of receiving the media data supplied from 30 
the data source according to the data request. There- 
fore, the receiving end is permitted to perform, by soft- 
ware, the process of playing media data corresponding 
to each of elements constituting a scene on time as des- 
ignated at the transmitting end. 35 
[0095] According to a fifteenth aspect of the present 
invention, there is provided a data storage medium 
which contains a data transmission program to make a 
computer perform a data transmission process of trans- 
mitting media data which is any of video data, audio da- 40 
ta, and text data and corresponds to plural elements 
constituting a scene, to a reception terminal for playing 
the media data to display the scene. This data transmis- 
sion program comprises a first program to make the 
computer perform a first process of transmitting location 45 
inf ormation indicating the locations of data sources hav- 
ing the respective media data on a network, first time 
information indicating the playback start times of the re- 
spective media data, and second time information for 
requesting the respective media data from the corre- so 
spending data sources; and a second program to make 
the computer perform a second process of transmitting 
the media data to the reception terminal, according to 
the request for the media data which is issued from the 
reception terminal on the basis of the first and second ss 
time information and the location information. There- 
fore, the transmitting end is permitted to perform, by 
software, the process of transmitting each media data 



to the receiving end so that playback of the media data 
at the receiving end is performed on time as designated 
by the transmitting end. 

BRIEF DESCRIPTION OF THE DRAWINGS 

[0096] Figure 1 is a block diagram for explaining a da- 
ta reception apparatus according to a first embodiment 
of the present invention. 

[0097] Figure 2 is a diagram illustrating the contents 
(scene description) of SMIL data supplied to the data 
reception apparatus of the first embodiment. 
[0098] Figures 3(a) and 3(b) are diagrams illustrating 
the spatial arrangement (3(a)) and the temporal ar- 
rangement (3(b)) of media data on the basis of the SMIL 
data supplied to the data reception apparatus of the first 
embodiment. 

[0099] Figure 4 is a diagram illustrating a time table 
which is created by a control data recording unit 103 in- 
cluded in the data reception apparatus of the first em- 
bodiment. 

[01 00] Figure 5 is a diagram for explaining the flow of 
procedure to obtain media data from a server by the data 
reception apparatus of the first embodiment. 
[0101] Figure 6 is a flowchart illustrating the process 
of calculating the time to issue a media data request 
command, in the data reception apparatus of the first 
embodiment. 

[0102] Figure 7 is a block diagram for explaining a da- 
ta reception apparatus according to a second embodi- 
ment of the present invention. 

[0103] Figure 8 is a diagram illustrating the contents 
(scene description) of SMIL data supplied to the data 
reception apparatus of the second embodiment. 
[0104] Figure 9 is a diagram for explaining, as a data 
transmission method according to the present invention, 
a method of transmitting information indicating a preb- 
uffering time (latency time), which information is includ- 
ed in SDP. 

[0105] Figure 10 is a diagram for explaining, as a data 
transmission method according to the present invention, 
a method of transmitting information indicating a preb- 
uffering time (latency time), which information is includ- 
ed in an acknowledge to a SETUP request of RTSP. 
[01 06] Figures 1 1 (a) and 1 1 (b) are diagrams illustrat- 
ing the spatial arrangement (1 1 (a)) and the temporal ar- 
rangement (11(b)) of media data on the basis of SMIL 
data supplied to a conventional data reception appara- 
tus. 

[01 07] Figure 1 2 is a diagram illustrating the contents 
(scene description) of the SMIL data supplied to the con- 
ventional data reception apparatus. 
[01 08] Figure 1 3 is a block diagram for explaining the 
conventional data reception apparatus. 
[0109] Figure 14 is a diagram for explaining the flow 
of procedure to obtain media data from a server by the 
conventional data reception apparatus. 
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DETAILED DESCRIPTION OF THE PREFERRED 
EMBODIMENTS 

[Embodiment 1] 

[01 1 0] Figure 1 is a block diagram for explaining a da- 
ta reception apparatus 110 according to a first embodi- 
ment of the present invention. 

[01 1 1 J The data reception apparatus 110 receives 
SMIL data Ds1 as scene description data, reproduces 
a composite image comprising one background image 
and two foreground images on the basis of the contents 
of the SMIL data, and displays the composite image. 
[0112] To be specific, the data reception apparatus 
110 includes an SMIL request/reception unit 102, and a 
control data generation unit 110a. The SMIL request/re- 
ception unit 102 outputs an SMIL request signal Srd for 
requesting a predetermined server to transmit SMIL da- 
ta Ds1 on the basis of third control data Dc3, receives 
the SMIL data Ds1 supplied from the server, and ana- 
lyzes the SMIL data Ds1. The control data generation 
unit 110a generates first and second control data Del 
and Dc2 on the basis of analysis data Da1 obtained by 
the analysis on the SMIL data in the SMIL request/re- 
ception unit 102. 

[0113] The data reception apparatus 110 further in- 
cludes a media data reception unit 1 06a for receiving 
image data (media data) Dm1 corresponding to a first 
foreground image from the server; a decoding unit 1 07a 
for decoding the received image data Dm1 to output de- 
coded image data Dd1 ; and a frame memory 108a for 
storing the decoded image data Dd1 in units of frames. 
Further, the data reception apparatus 110 includes a 
media data reception unit 1 06b for receiving image data 
(media data) Dm2 corresponding to a second fore- 
ground image from the server; a decoding unit 107b for 
decoding the received image data Dm2 to output decod- 
ed image data Dd2; and a frame memory 1 08b for stor- 
ing the decoded image data Dd2 in units of frames. 
[01 1 4] Furthermore, the data reception apparatus 1 1 0 
includes a display unit 109 for reading the decoded im- 
age data Ddf and Dd2 respectively stored in the frame 
memories 108a and 108b, on the basis of the first control 
data Del supplied from the control data generation unit 
110a, combining these data with a background image 
to generate a composite image, and displaying the com- 
posite image. The data reception apparatus 110 further 
includes a data request/reception unit 1 05 for outputting 
a data Tequest signal Srp for requesting data from a pre- 
determined server on the basis of the second control 
data Dc2 supplied from the control data generation unit 
210a, and receiving an acknowledge signal Sack to the 
data request from the server. 

[01 1 5] The control data generation unit 1 1 0a compris- 
es a control data recording unit 1 03, and a trigger signal 
generation unit 1 04. The control data recording unit 1 03 
creates a time table in which plural items, each compris- 
ing a control command to be output as control data to 



the data request/reception unit 1 05 and the display unit 
109 and information relating to the command, are ar- 
ranged in order of time to execute the command, on the 
basis of the analysis data Da from the SMIL request/ 

5 reception unit 1 02, and outputs time information It relat- 
ing to the execution time of each control command in 
order of time. On receipt of the time information It, the 
trigger signal generation unit 104 sets the execution 
time of the control command corresponding to each item 

to to start docking operation, and outputs a trigger signal 
St to the control data recording unit 103 every time the 
clock time reaches the set control command execution 
time. Every time the control data recording unit 103 re- 
ceives the trigger signal St from the trigger signal gen- 

15 eration unit 1 04, the unit 1 03 outputs the corresponding 
control command to the data request/reception unit 105 
or the display unit 109 as the control data Del or Dc2. 
[01 16] In figure 1 , reference numeral 101a denotes a 
clock circuit for supplying a reference clock to each com- 

20 ponent of the data reception apparatus 110, and this is 
identical to the clock circuit of the conventional data re- 
ception apparatus 901 . 

[01171 The trigger signal generation unit 104 may be 
implemented by a timer which is able to set plural times, 
25 performs clocking operation on the basis of the refer- 
ence clock from the clock circuit 101a, and outputs a 
trigger signal every time the clock time reaches the set 
time. 

[01 18] In this first embodiment, the data reception ap- 
30 paratus 110 includes two data reception unit, two decod- 
ing unit, and two frame memories, obtains media data 
corresponding to two foreground images from the server 
on the network, and combines the two foreground im- 
ages on one background image to display a composite 
35 image. However, the number of media data obtained 
from the server on the network is not restricted to two. 
For example, in the case where the data reception ap- 
paratus 1 1 0 obtains three or more media data from the 
server on the network, the apparatus 110 is provided 
40 with data reception unit, decoding unit, and frame mem- 
ories as many as the number of the media data to be 
obtained. 

[0119] Figure 2 is a diagram illustrating an example 
of contents of the above-mentioned SMIL data, and the 

45 data reception apparatus 110 of this first embodiment 
receives the SMIL data shown in figure 2. Figures 3(a) 
and 3(b) illustrate the spatial arrangement and the tem- 
poral arrangement of media data as the contents of the 
SMIL data shown in figure 2, respectively. 

50 [0120] In figure 2, character strings <smil>, </smil>, 
<head>, </head>, <layout>, </layout>, <root-layout>, 
<region>, <body>, <par>, </par>, <video> which are de- 
scribed at the heads of the respective rows of scene de- 
scription SD1 are called "elements", and declare the 

55 contents of description following the respective ele- 
ments. That is, elements 21 0a, 21 0b, 220a, 220b, 230a, 
230b, 240a, 240b, 250a, 250b shown in figure 2 are 
identical to the elements 71 0a, 71 0b, 720a, 720b, 730a, 
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730b, 740a, 740b, 750a, 750b shown in figure 12, re- 
spectively. Further, rows 201 —203 shown in figure 2 are 
identical to the rows 701 703 shown in figure 12, respec- 
tively. However, rows 204 and 205 shown in figure 2 are 
different from the rows 704 and 705 shown in figure 12, 5 
respectively. 

[0121] First of all, the spatial arrangement of media 
data designated by the SMIL data will be described with 
reference to figure 2. 

[0122] The root-layout element 201a designates the 10 
size of the entire scene. That is, the root-layout element 
201a indicates the size of the rectangle region where 
the entire scene is displayed, that is, it indicates that the 
width and the height of the rectangle region are 300 
points and 200 points, respectively, by the width at- is 
tribute (width^SOO") and the height attribute 
(height= H 200 M ) attached to this element. Further, the id 
attribute relating to this element 201 shows the back- 
ground image (bg) (id- w bg"). 

[0123] The region element 202a indicates the size of 20 
the rectangle region where an image corresponding to 
this element 202 is displayed, that is, it indicates that 
the width and the height of the rectangle region are 300 
points and 50 points, respectively, by the width attribute 
(width^SOO") and the height attribute (height=°50 n ) at- 25 
tached to the region element 202a. Further, the region 
element 202a indicates, by the left attribute (left= n 0") 
and the top attribute (top- n 150") attached to the region 
element 202a, that the upper left edge of the rectangle 
region is positioned at a distance of 0 point from the left 30 
edge of the image space 1100 and 150 points from the 
upper edge of the image space 1100. Further, the id at- 
tribute attached to this element 202a indicates the first 
foreground image (adv) (id^'adv"). 

[0124] The region attribute attached to the video ele- 35 
ment204a indicates the first foreground image (adv) (re- 
gion^ adv"). 

[0125] Accordingly, the rectangle region whose size 
and position are designated by the region element 202a 
is a region where the first foreground image (adv) is 40 
placed (hereinafter also referred to as an adv region). 
[0128] The region element 203a indicates, by the 
width attribute (width^CD") and the height attribute 
(heights 150") attached to this element, that the width 
and the height of the corresponding rectangle region are 45 
200 points and 150 points, respectively. Further, the re- 
gion element 203a indicates, by the left attribute 
(left="50") and the top attribute (top= B 0") attached to this 
element, that the upper left edge of this rectangle region 
is positioned at a distance of 50 points from the left edge so 
of the image space 1100 and 0 point from the upper 
edge of the image space 11 00. The id attribute attached 
to this element 203a indicates the second foreground 
image (mov) (id= M mov H ). 

[0127] The region attribute attached to the video ele- ss 
ment 205a indicates the second foreground image 
(mov) (region^mov"). 

[0128] Accordingly, the rectangle region whose size 



and position are designated by the region element 203a 
is a region where the second foreground image (mov) 
is placed (hereinafter also referred to as a mov region). 
[01 29] The bg region is a region as a background, the 
adv region is a region where an advertisement or the 
like is displayed, and the mov region is a region where 
a moving image or the like is displayed. 
[0130] Consequently, as shown in figure 3(a), the po- 
sitions of the adv region 1120, mov region 1130, and bg 
region 1110 based on the scene description SD1 are 
identical to the positions of these regions shown in figure 
11(a). 

[0131] More specifically, the predetermined image 
space 11 00 is the background display region (bg region) 
1110 where the background image (bg) is displayed. 
The first foreground display region (adv region) 1120 
where the first foreground image (adv) such as an ad- 
vertisement is placed and the second foreground region 
(mov region) 1130 where the second foreground image 
(mov) as a moving picture are placed in the background 
dispiay region 1110. The sizes of the regions where the 
respective images are placed and their positions in the 
image space are identical to those shown in figure 1 1 (a). 
[0132] Next, a description will be given of the temporal 
arrangement of the media data designated by the SMIL 
data shown in figure 2. 

[01 33] The begin attribute (begin="5s n ) relating to the 
video element 204a indicates that display of the image 
data corresponding to this element 204a should be start- 
ed five seconds after scene display is started. 
[0134] The scr attribute (scr= M rtsp://s2.com/adv. 
mpg") relating to the video element 204a indicates that 
the image data corresponding to this video element 
204a should be obtained by issuing a command re- 
questing the server (s2.com) to transmit the image data 
(adv.mpg) stored in this server, by using RTSP. 
[0135] On the other hand, the begin attribute (be- 
gin="10s") relating to the video element 205a indicates 
that display of the image data corresponding to this el- 
ement 205a should be started ten seconds after scene 
display is started. 

[0136] The scr attribute (scr="rtsp://s3.com/mov. 
mpg M ) relating to the video element 205a indicates that 
the image data corresponding to this video element 
205a should be obtained by issuing a command re- 
questing the server (s3.com) to transmit the image data 
(mov.nnpg} stored in this server, by using RTSP. 
[0137] Consequently, as shown in figure 3(b), display 
of the first foreground image (adv) is started when five 
seconds have passed from the start of scene (back- 
ground image) display, and display of the second fore- 
ground image (mov) is started when ten seconds have 
passed from the start of scene display. 
[0138] Further, each of the video elements 204a and 
205a has a prebuffering attribute. The prebuffering at- 
tribute indicates the latency time from reception of the 
media data to decoding on it. For example, the prebuff- 
ering attribute (prebuffering^Vs'') relating to the video 



12 



23 



EP1 126 714 A2 



24 



element 204a indicates that the image data (adv.mpg) 
corresponding to the video element 204a should wart 
seven seconds for decoding after it is received by the 
data reception apparatus. The prebuffering attribute 
(prebuffering="15s H ) relating to the video element 205a 
indicates that the image data (mov.mpg) corresponding 
to the video element 205a should wait fifteen seconds 
for decoding after it is received by the data reception 
apparatus. 

[0139] In the data reception apparatus 1 1 0 according 
to this first embodiment, when the scene description da- 
ta SD1 is received, a time table considering the latency 
times for the respective video elements is created and 
stored in the control data generation unit 110a. 
[0140] On this time table, the times to issue control 
commands are set so that receptions of the image data 
(adv.mpg) and (mov.mpg) corresponding to the video el- 
ement 204a and 205a are started two seconds and five 
seconds before start of scene display, respectively, and 
displays of the image data (adv.mpg) and (mov.mpg) are 
started at times Tadv (Tadv=5sec.) and Tmov 
(Tmov=1 Osec.) after the latency times of seven seconds 
and fifteen seconds have passed from start of recep- 
tions of the video elements 204a and 205a, respectively. 
[01 41 ] Figure 4 shows the contents of a time table to 
be stored in the control data generation unit 110a as the 
contents of the SMIL data. 

[0142] The time table Tab has an item indicating the 
time to perform data request or data display, an item in- 
dicating the data request/reception unit 105 or the dis- 
play unit 1 09 as a control target to which a control com- 
mand is issued, and an item indicating the control com- 
mand to the control target. A plurality of events, each 
having the items of time, control target, and control com- 
mand, are listed in chronological order. With respect to 
the event whose control target is the data request/re- 
ception unit 105, information designated by the src at- 
tribute relating to the video element of the SMIL data is 
described in the item of the control command. Further, 
with respect to the event whose control target is the dis- 
play unit, information designated by the id, width, height, 
left, and top attributes relating to the root-layout element 
or the region element of the SMIL data is described in 
the item of the control command. 
[01 43] Hereinafter, the operation of the data reception 
apparatus 110 will be described. 
[0144] Figure 5 is a diagram for explaining the flow of 
a procedure by which the data reception apparatus 110 
obtains media data from the server. More specifically, 
figure 5 illustrates exchange of messages between the 
data reception apparatus and the server, and transmis- 
sion of media data from the server to the data reception 
apparatus. 

[01 45] It is assumed that the data reception apparatus 
11 0 is mounted on a personal computer as a client ter- 
minal, and the data reception apparatus 1 1 0 is supplied 
with SMIL data Ds1 as data indicating the scene de- 
scription data SD1 shown in figure 2. 



[0146] When the user, who is viewing a home page 
described by HTML (Hyper Text Markup Language) us- 
ing a Web browser installed on the personal computer, 
clicks a region on the home page linked to a predeter- 

5 mined scene description SD1 (user operation), the data 
reception apparatus 110 of the client terminal issues an 
SMIL request command (GET http//sl.corrvscene.smil) 
C1 requesting SMIL data Ds1 indicating the scene de- 
scription SD1. This command C1 requests the server 

10 (sl.com) 1 3a to distribute the SMIL data by HTTP. 

[0147] On receipt of the SMIL request command C1 , 
the server 13a issues an acknowledge (HTTP/1 .0 OK) 
R1 indicating that the command C1 has been accepted, 
to the client terminal, and transmits the SMIL data 

15 (scene.sml) Ds1 to the client terminal. 

[0148] In the data reception apparatus 110 of the cli- 
ent terminal, the SMIL request/reception unit 1 02 re- 
ceives the SMIL data Ds1 , and analyzes the SMIL data 
Ds1. 

20 [0149] The SMIL analysis data Da1 obtained by the 
analysis on the SMIL data is transmitted to the control 
data generation unit 1 1 0a to be stored in the control data 
recording unit 1 03. Then, the control data recording unit 
1 03 creates the time table Tab shown in figure 4 on the 

25 basis of the SMIL analysis data Da1 , whereby the con- 
tents of the SMIL data are stored in the form of a time 
table. 

[01 50] Hereinafter, the process of creating the time ta- 
ble by the control data recording unit 103 will be de- 

30 scribed briefly. 

[0151] Initially, in the control data recording unit 103, 
the time to issue a control command for requesting me- 
dia data corresponding to each video element is ob- 
tained by using the display start time indicated by the 

35 begin attribute of each video element, and the latency 
time (prebuffering time) indicated by the prebuffering at- 
tribute of each video element. The time to issue a control 
command requesting media data is obtained by sub- 
tracting the latency time from the display start time. To 

40 be specific, the time Tpadv to issue a control command 
requesting the media data (adv.mpg) corresponding to 
the video element 204a is -2 sec. with reference to the 
scene display start time Tbg (Tbg=0sec), and the time 
Tpmov to issue a control command requesting the me- 

45 dia data (mov.mpg) corresponding to the video element 
205a is -5 sec. 

[0152] Thereafter, in the control data recording unit 
1 03, on the basis of the SMIL analysis data Da1 , the 
contents of the SMIL data are sorted into two groups, i. 

50 e., information required to request the media data (in- 
formation designated by the src attribute included in the 
video element), and information required to display the 
media data (information designated by the id, width, 
height, left, and top attributes included in the root-layout 

55 element or the region element). 

[0153] Next, in the control data recording unit 103, 
event data E1 is created on the basis of the information 
to display the media data, which event data comprises 
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information indicating a control command to request the 
media data (mov.mpg), information indicating the data 
request unit as a target of the control command, and 
information indicating the time to issue the control com- 
mand. Further, event data E2 is created, which compris- 
es information indicating a control command to request 
the media data (adv.mpg), information indicating the da- 
ta request unit as a target of the control command, and 
information indicating the time to issue the control com- 
mand. 

[0154] Furthermore, on the basis of the information to 
request the media data, the following event data are cre- 
ated: event data E3 comprising information indicating a 
control command to display the background image (bg), 
information indicating the display unit as a target of the 
control command, and information indicating the time to 
issue the control command; event data E4 comprising 
information indicating a control command to display the 
first foreground image (adv), information indicating the 
display unit as a target of the control command, and in- 
formation indicating the time to issue the control com- 
mand; and event data E5 comprising information indi- 
cating a control command to display the second fore- 
ground image (mov), information indicating the display 
unit as a target of the control command, and information 
indicating the time to issue the control command. 
[0155] Thereafter, in the control data recording unit 
103, the respective event data are arranged according 
to the corresponding control command issue times 
(chronological order) to create the time table shown in 
figure 4, and the time table so created is stored. 
[0156] To be specific, in the scene description SD1 
shown in figure 2, the times to request the media data 
(adv.mpg) and (mov.mpg) corresponding to the video el- 
ements 204 and 205 (the times to issue control com- 
mands) are set at -5 sec. and -2 sec, respectively. The 
display start times of the foreground images (adv) and 
(mov) are set at 5 sec. and 10 sec, respectively, and 
the display start time of the scene (background image) 
is 0 sec Accordingly, as shown in figure 4, on the time 
table stored in the control data recording unit 103, the 
first event data is the event data E1 , the second event 
data is the event data E2, the third event data is the 
event data E3, the fourth event data is the event data 
E4, and the fifth event data is the event data E5. 
[01 57] Thereafter, the control data recording unit 1 03 
outputs the information indicating the issue times of the 
respective control commands (time information), from 
the time table to the signal generation unit 104 in de- 
scending order. 

[01 58] When the time information is input to the signal 
generation unit 104, the time indicating the time infor- 
mation is recorded as a set time in order of reception, 
and the clock starts clocking operation. 
[0159] At this time, in the data reception apparatus 
110, simultaneously with the creation of the time table, 
the data reception apparatus and a predetermined serv- 
er exchange messages so as to set up transmission of 



image data at the server. 

[0160] To be specific, as shown in figure 5, the data 
reception apparatus 110 of the client terminal issues a 
command (DESCRIBE rtsp://s3.com/mov.mpg) C2a re- 

5 questing specific information relating to the media data 
corresponding to the second foreground image (mov) 
(e.g., coding condition, existence of plural candidate da- 
ta, etc), to the third server (s3.com) 1 3c 
[0161] On receipt of the command C2a, the third serv- 

io er 13c issues an acknowledge (RTSP/1 .0 OK) R2a in- 
dicating that the command has been accepted, to the 
client terminal, and transmits SDP (Session Description 
Protocol) information to the client terminal. 
[0162] Next, the data reception apparatus 110 of the 

15 client terminal issues a setup request command (SET- 
UP rtsp://s3.com/ adv.mpg) C3a which requests the 
third server (s3.com) 1 3c to set up provision of the media 
data corresponding to the second foreground image 
(mov), to the third server 13c. Upon completion of setup 

20 for the media data, the third server 13c issues an ac- 
knowledge (RTSP/1 .0 OK) R3 indicating that the com- 
mand C3a has been accepted, to the client terminal. 
[0163] Thereafter, the data reception apparatus 110 
of the client terminal issues a command (DESCRIBE rt- 

25 sp://s2.com/adv.mpg) C2b requesting specific informa- 
tion relating to the media data corresponding to the first 
foreground image (adv) (e.g., coding condition, exist- 
ence of plural candidate data, etc.), to the second server 
(s2.com) 13b. 

30 [0164] On receipt of the command C2b, the second 
server 13b issues an acknowledge (RTSP/1 .0 OK) R2b 
indicating that the command C2b has been accepted, 
to the client terminal, and transmits SDP (Session De- 
scription Protocol) information to the client terminal. 

35 [0165] Next, the data reception apparatus 11 0 of the 
client terminal issues a setup request command (SET- 
UP rtsp://s3.com/ adv.mpg) C3b which requests the 
second server (s2.com) 1 3b to set up provision of the 
media data corresponding to the first foreground image 

40 (adv), to the second server 1 3b. Upon completion of set- 
up for the media data, the second server 1 3b issues an 
acknowledge (RTSP/1 .0 OK) R3b indicating that the 
command C3b has been accepted, to the client terminal. 
[0166] When the time of the clock reaches the set time 

45 stored in the signal generation unit 1 04, the signal gen- 
eration unit 104 generates a trigger signal St, and out- 
puts it to the control data memory 1 03. Since the set 
times stored in the signal generation unit 1 04 are -5, -2, 
0, 5, and 1 0 sec, the signal generation unit 1 04 outputs 

50 a trigger signal every time the clock time reaches -5, -2, 
0, 5, and 10 sec Upon reception of every trigger signal, 
the control data recording unit 1 03 issues a control com- 
mand included in the event on the timetable, to the cor- 
responding control target. 

55 [01 67] First of all , when a trigger signal outputted from 
the signal generation unit 104 at timet (=-5sec) is input 
to the control data recording unit 1 03, the control data 
recording unit 103 outputs a control command (PLAY 
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rtspi//s3.com/mov.mpg) C4a of the first event on the 
time table, to the data request/reception unit 105 as a 
target of this contro! command. 

[0168] The data request/reception unit 105 outputs a 
message by RTSP for requesting the image data (mov. 
mpg), to the third server (//s3.com) 13c, on the basis of 
the control command (PLAY rtsp://s3.com/mov.mpg) 
C4a from the control data recording unit 103. 
[0169] On receipt of the message from the data re- 
quest/reception unit 105, the third server 13c transmits 
the image data (mov.mpg) by RTP to the data reception 
apparatus 110. 

[0170] The image data (mov.mpg) Dm2 transmitted 
from the server 1 3c is received by the media data re- 
ception unit 106b. The image data Dm2 is a bit stream 
which is compressive^ coded by a coding method 
based on MPEG standard or the like. The bit stream (im- 
age data) inputted to the media data reception unit 1 06b 
is output to the decoding unit 1 07b frame by frame. In 
the decoding unit 1 07b, the bit stream is decoded frame 
by frame. The decoded image data Dd2 obtained in the 
decoding unit 1 07b is stored in the frame memory 1 08b 
frame by frame. 

[01 71 ] When a trigger signal outputted from the signal 
generation unit 1 04 at time t (=-2sec.) is input to the con- 
trol data recording unit 103, the control data recording 
unit 103 outputs a control command (PLAY rtsp:// 
s2.com/adv.mpg) C4b of the second event on the time 
table, to the data request/reception unit 1 05 as a target 
of this control command. 

[0172] The data request/reception unit 105 outputs a 
message by RTSP for requesting the image data (adv. 
mpg) Dm1, to the second server (s2.com) 13b, on the 
basis of the control command (PLAY rtsp://s2.com/adv. 
mpg) C4b from the control data recording unit 103. 
[0173] On receipt of the message from the data re- 
quest/reception unit 105, the second server 13b trans- 
mits the image data (adv.mpg) Dm1 by RTP to the data 
reception apparatus 110. 

[0174] The image data (adv.mpg) Dm1 transmitted 
from the server 13b is received by the media data re- 
ception unit 106a. The image data Dm1 is a bit stream 
which is compressively coded by a coding method 
based on MPEG standard orthe like. The bit stream (im- 
age data) inputted to the media data reception unit 1 06a 
is output to the decoding unit 107a frame by frame. In 
the decoding unit 1 07a, the bit stream is decoded frame 
by frame. The decoded image data Dd1 obtained in the 
decoding unit 107a is stored in the frame memory 1 08a 
frame by frame. 

[01 75] When a trigger signal outputted from the signal 
generation unit 1 04 at time t (=0sec.) is input to the con- 
trol data recording unit 103, the control data recording 
unit 103 outputs a control command (bg// 
width300/height200) of the third event on the time table, 
as control data Del , to the display unit 109 as a target 
of this control command. The display unit 1 09 displays 
the background image (bg) over the image space, ac- 



cording to the control command (bg// 
width300/height200) from the control data recording unit 
103. The data of the background image is retained by 
the data reception apparatus 110 in advance. At this 
5 point of time (t=0sec.), the display start times of the first 
and second foreground images indicated by the begin 
attributes of the video elements 204a and 205a, respec- 
tively, are largerthan 0 sec. and, thereafter, the first fore- 
ground image (adv) and the second foreground image 
10 (mov) are not displayed in the adv region (first fore- 
ground display region) 1 1 20 and the mov region (second 
foreground display region) 1130, respectively. 
[01 76] When a trigger signal outputted from the signal 
generation unit 1 04 at time t (=5sec.) is input to the con- 
's trol data recording unit 103, the control data recording 
unit 1 03 outputs a control command (adv// 
Ieft0/top150/width300/height50) of the fourth event on 
the time table, as control data Del , to the display unit 
1 09 as a target of this control command. In the display 
20 unit 109, the decoded image data Dd2 is read frame by 
frame, from the frame memory 1 08a, on the basis of the 
control command (adv//left0Aop150/width300/height50) 
from the control data recording unit 103, and the first 
foreground image (adv) is combined with the back- 
us ground image such that it is placed on the adv region 
(first foreground display region) 1 120 in the image space 
1100. 

[0177] Further, when a trigger signal St outputted from 
the signal generation unit 104 at time t (=10sec.) is input 

30 to the control data recording unit 103, the control data 
recording unit 103 outputs a control command (mov// 
left50Aop0/width200/height150) of the fifth event on the 
time table, as control data Del , to the display unit 1 09 as 
a target of this control command. In the display unit 1 09, 

35 the decoded image data Dd2 is read frame by frame, 
from the frame memory 1 08a, on the basis of the control 
command (mov//left50AopO/width200/height150) from 
the control data recording unit 103, and the second fore- 
ground image (mov) is combined with the background 

40 image and the first foreground image such that it is 
placed on the mov region (second foreground display re- 
gion) 1130 in the image space 1100. 
[0178] Figure 6 is a flowchart illustrating a specific 
process of calculating the time to issue a control com- 

45 mand requesting media data in the control data record- 
ing unit 103. Hereinafter, the calculation process will be 
described briefly. In the flowchart shown in figure 6, the 
first set time T1 [n] is the time to issue a control command 
requesting media data corresponding to the n-th video 

50 element in the scene description SD1 (hereinafter re- 
ferred to simply as media data request time), and the 
second set time T2[n] is the time to display media data 
corresponding to the n-th video element. 
[0179] Furthermore, figure 6 illustrates a process of 

55 calculating the media data request time T1 [n] by intro- 
ducing, in addition to the prebuffering time, the time C 
req uired from when the contro I command requesting the 
media data is issued to the server to when the client re- 
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ceives the media data. 
[0180] First of ail, in the control data recording unit 
1 03, the first internal variable n used for time calculation 
is set at zero (step S501). The variable n increases by 
1 every time the time calculation on a video element in 5 
the scene description SD1 is completed. 
[0181] Next, a video element to be subjected to the 
time calculation process (target video element) is decid- 
ed on the basis of the analysis data Dd1 from the SMIL 
request/reception unit 1 02 (step S502). Usually, a target 10 
video element is successively selected from the head of 
plural video elements which are arranged in predeter- 
mined order in the scene description SD1 . Therefore, 
the video element 204a is selected first, between the 
video elements 204a and 205a. is 
[0182] Subsequently, in the control data recording 
unit 1 03, the value "7" of the prebuffering attribute of the 
video element 204a is set as the second internal varia- 
ble P used for the time calculation process, and the val- 
ue "5 n of the begin attribute of the video element 204a 20 
is set as the third internal variable B used for the time 
calculation process (step S503). 
[0183] Thereafter, in the control data recording unit 
1 03, the first set time T1 [n] is calculated on the basis of 
the following formula (1 ) (step S504). 25 

T1[n]=B-P-C (1) 

wherein C is a constant indicating the time required from 30 
when the data request/reception unit 105 issues a con- 
trol command requesting media data to when the data 
reception unit receives the media data, and the value of 
the constant C is set by predicting the time from the re- 
quest control command issue time to the data reception 35 
time. In this first embodiment, the constant C is set at 0 
sec. 

[0184] Accordingly, when 5, 7, 0 (sec.) based on the 
scene display start time (Osec.) are assigned to the var- 
iables B, P, C in formula (1), respectively, the first set 40 
time T1 [0] corresponding to the first video element 204a 
becomes -2, and the time to issue the control command 
requesting the media data of the first foreground image 
(adv) is two seconds before the scene display start time 
(Osec). 45 
[01 85] Further, in the control data recording unit 1 03, 
the second set time T2[n] is calculated on the basis of 
the following formula (2) (step S505). 

T2[n]=B (2) 

[01 86] As the result, the second set time T2[0] corre- 
sponding to the first video element 204a becomes 5, and 
the time to display the first foreground image (adv) is 55 
five seconds before the scene display start time (Osec.). 
[0187] Thereafter, in the control data recording unit 
1 03, it is decided whether or not the first and second set 
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times have been calculated for all of the video elements 
shown in the scene description SD1 (step S506). When 
the first and second set times have already been calcu- 
lated for all of the video elements, the first and second 
set times T1 [n] andT2[n] (n=0,1 ) of the respective video 
elements and the scene display start time Tab are en- 
tered in the time field of the time table (step S508). 
[0188] On the other hand, when the first and second 
set times have not yet been calculated, the value of the 
variable n is incremented by 1 (step S507), and the proc- 
esses in steps S502 to S506 are repeated. 
[0189] In the data reception apparatus 1 1 0, when cal- 
culation of the first and second set times for the video 
element 204a has been completed, calculation of the 
set times for the video element 205a is not completed 
yet. Therefore, the value of the variable n is incremented 
by 1 (step S507), and the processes in steps S502 to 
S506 are performed on the video element 205. 
[01 90] When calculating the f irst and second set times 
for the video element 205a, 10, 15, and 0 (sec.) based 
on the scene display start time (Osec.) are assigned to 
the variables B, P, and C in formula (1), respectively. As 
the result, the first set time T1[1] of the second video 
element 205a becomes -5, and the time to issue the con- 
trol command requesting the media data of the second 
foreground image (mov) is five seconds before the 
scene display start time (Osec.). Further, the second set 
time T2[1] of the second video element 205a becomes 
1 0, and the time to display the second foreground image 
(mov) is ten seconds before the scene display start time 
(Osec). 

[01 91 ] At this point of time, calculation of the first and 
second set times for all of the video elements has been 
completed. Therefore, the first and second set times T1 
[0] and T2[0] of the video element 204a, the first and 
second set times T1 [1] and T2[1] of the video element 
205a, and the scene display start time Tab are entered 
in the time field of the time table (step S508). 
[0192] That is, in the time field of the time table, -5, 
-2, 0, 5, and 10 sec are entered in this order as time 
information of control commands. 
[0193] As described above, the data reception appa- 
ratus 110 of the first embodiment is provided with the 
SMIL request/reception unit 102 which requests the 
server 13a to transmit the SMIL data Ds1 as data indi- 
cating the scene description SD1 for combining the first 
and second foreground images (adv) and (mov) with the 
background image (bg) to display the composite image, 
and receives the SMIL data Ds1 from the server 13a; 
the data request/reception unit 105 which requests the 
servers 13b and 13c to transmit the media data Dm1 
and Dm2 of the respective foreground images, and re- 
ceives the messages from the servers; and the control 
data generation unit 110a which controls the data re- 
quest/reception unit 1 05 so that the media data request 
messages are issued to the corresponding servers at 
times the latency times earlier than the display start 
times of the respective foreground images, on the basis 
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of the information indicating the latency times before 
starting display of the respective foreground images in- 
cluded in the SMIL data Ds1. Therefore, each fore- 
ground image can be combined with the background im- 
age to display a composite image, at the time designat- 5 
ed by the scene description. 

[01 94] Further, by setting the latency time at a suffi- 
ciently large value considering the condition of the net- 
work through which the media data is transmitted (e.g., 
band width, congestion, etc.), playback of the media da- '0 
ta by the data reception apparatus is hardly affected by 
jitter in the network, thereby preventing the image dis- 
play from being interrupted during playback of the media 
data. 

[0195] Furthermore, in the data reception apparatus « 
1 1 0 of this first embodiment, the control data recording 
unit 103 manages the time to request media data of 
each foreground image from the server and the time to 
display each foreground image, with reference to the 
time table containing information about these times, on 20 
the basis of the SMIL data. Further, the control data re- 
cording unit 103 issues a control command to instruct 
the data request unit to make a request for media data 
or a control command to instruct the display unit to start 
display of media data, every time the clock time in the 25 
reception apparatus reaches the time described on the 
time table. Therefore, even when the number of fore- 
ground images constituting the composite image is in- 
creased, comparison of the clock time with the time in- 
formation described on the time table permits the data 30 
request unit to make a request for each media data at 
an appropriate time before starting display of each fore- 
ground image, whereby the foreground image is display 
satisfactorily. 

[0196] In this first embodiment, the control data re- 35 
cording unit 103 calculates the time to issue a control 
command, with the delay time C being set at 0, which 
delay time is required from when the unit 103 issues a 
control command requesting media data to the server 
to when the media data is received. However, this delay *o 
time C may be set at an arbitrary number larger than 0 
according to the type of the network (e.g., a network in- 
cluding a radio communication line, or a network com- 
prising only a wired communication line). 
[0197] While in this first embodiment the data recep- *5 
tion apparatus receives video data as media data, the 
media data is not restricted to video data, and it may be 
text data, audio data, or the like. Also in this case, the 
same effects as mentioned above are achieved. 
[0198] While in this first embodiment the video data 50 
supplied to the data reception apparatus have been 
compressively coded by MPEG, the video data may 
have been compressively coded by other coding meth- 
ods, such as JPEG (Joint Photographic Coding Experts 
Group), GIF (Graphics Interchange Format), H.261, H. & 
263, and the like. 

[0199] While in this first embodiment the scene de- 
scription data designates RTSP as a transmission pro- 



toco! for making a data request, the scene description 
data may designate other protocols such as HTTP (Hy- 
per Text Transfer Protocol) and the like. 
[0200] Furthermore, in this first embodiment, the con- 
trol data recording unit 103 calculates the time to issue 
a control command to the data request unit or the display 
unit, and the signal generation unit 104 sets the control 
command issue time calculated by the unit 1 03 as a trig- 
ger generation time, and outputs a trigger signal to the 
control data recording unit 1 03 every time the clock time 
in the signal generation unit 1 04 reaches the set trigger 
generation time. However, the time to issue a control 
command to the data request/reception unit or the dis- 
play unit may be calculated by the signal generation unit 
104. In this case, the control data recording unit 103 
must manage the respective control commands accord- 
ing to their issue times. 

[0201] While in this first embodiment the data recep- 
tion apparatus calculates the time to request media data 
from the server by using the prebuffering attribute value 
which is attached to the video element and indicates the 
latency time, the data reception apparatus may calcu- 
late the media data request time by using a request at- 
tribute value which indicates the time to output a data 
request message to the server. 

[Embodiment 2] 

[0202] Figure 7 is a block diagram for explaining a da- 
ta reception apparatus 1 20 according to a second em- 
bodiment of the present invention. 
[0203] The data reception apparatus 120 of this sec- 
ond embodiment employs, as scene description data, 
SMIL data Ds2 which is different from the SMIL data Ds1 
used for the first embodiment, and the apparatus 1 20 is 
provided with a control data recording unit 120a for gen- 
erating control data Del and Dc2 on the basis of the 
SMIL data Ds2, instead of the control data generation 
unit 110a of the first embodiment for generating control 
data Del and Dc2 on the basis of the SMIL data Ds1 . 
[0204] Figure 8 is a diagram illustrating the contents 
of the SMIL data Ds2 (scene description DS2) supplied 
as scene description data to the data reception appara- 
tus 120. 

[0205] The SMIL data Ds2 includes a request attribute 
value indicating the time to output a data request mes- 
sage to the server, instead of the prebuffering attribute 
in the SMIL data Ds1 . That is, in the SMIL data Ds2, a 
video element 601 a has a region attribute (region~"-2s") 
indicating that a request message for image data (adv. 
mpg) is output two seconds before starting scene dis- 
play, instead of the prebuffering attribute (prebuffer- 
ing="7s B ) possessed by the video element 201 a of the 
SMIL data Ds1 . Further, in the SMIL data Ds2, a video 
element 602a has a region attribute (region-"-5s w ) indi- 
cating that a request message for image data (mov. 
mpg) is output five seconds before starting scene dis- 
play, instead of the prebuffering attribute (prebuffer- 
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ing="15s") possessed by the video element 202a of the 
SMIL data Ds1. 

[0206] !n figure 8 t a row 601 including the video ele- 
ment 601 a corresponds to the row 210 including the vid- 
eo element 201a in the scene description SD1 of the 5 
first embodiment, and a row 602 including the video el- 
ement 602a corresponds to the row 202 including the 
video element 202a in the scene description SD1 of the 
first embodiment. 

[0207] in the data reception apparatus 1 20 ol this sec- "»t> 
ond embodiment, a control data generation unit 120a 
comprises a control data recording unit 123, and a trig- 
ger signal generation unit 124. The control data record- 
ing unit 123 creates a time table Tab shown in figure 4, 
outputs the control data Del and Dc2, and outputs time is 
information It, on the basis of SMIL analysis data Da2 
obtained by analyzing the SMIL data Ds2. The trigger 
signal generation unit 124 is identical in construction to 
the trigger signal generation unit 1 04 included in the da- 
ta reception apparatus 11 0 of the first embodiment. 20 
[0208] Also in the data reception apparatus 120, like 
in the data reception apparatus 110 of the first embodi- 
ment, a request message for each media data is issued 
to a specific server at a time a predetermined period ear- 
lier than the display start time of each foreground image, & 
on the basis of the SMIL data Ds2 supplied from the 
server, whereby the foreground image can be combined 
with the background image for display at the time des- 
ignated by the scene description. 

[0209] In this second embodiment, a media data re- 30 
quest message is issued at the timing when the clock 
time in the data reception apparatus reaches the time 
designated by the request attribute in the SMIL data 
Ds2. However, as described for the first embodiment, 
the media data request message may be issued consid- 35 
ering the constant C which is the time required from 
when the media data request message is transmitted to 
the server to when the media data is received, that is, 
the request message may be issued at a time by the 
constant C earlier than the time designated by the re- *o 
quest attribute. 

[0210] Further, while in the first and second embodi- 
ments the attribute indicating the latency time before 
starting display of each foreground image is called 
"prebuffering attribute" and the attribute indicating the 
time to issue a media data request message is called 
"request attribute", these attributes may be called in oth- 
er names so long as the meanings are the same. 
[0211] While in the first embodiment the latency time 
from when image data of an image is requested to the so 
server to when display of the image is started is deter- 
mined on the basis of the SMIL data, the latency time 
may be determined on the basis of control data other 
than the SMIL data. For example, when inputted image 
data is data (bit stream) which has been encoded by 55 
MPEG coding, the latency time may be set on the basis 
of VBV (Video Buffer Verifier) delay information which 
is multiplexed in a header of each frame of a video bit 



stream, such that the latency time is longer than the de- 
lay time indicated by this information. In this case, the 
following effects are achieved. 

[021 2] In a video decoder receiving a bit stream trans- 
mitted at a constant transmission rate, since video data 
varies frame by frame, the latency time from when a bit 
stream is received to when the bit stream is decoded 
varies frame by frame. The VBV delay value multiplexed 
in the header of each frame of the video bit stream 
shows this delay time. Therefore, by starting decoding 
on the video bit stream when the time indicated by the 
VBV delay value has passed after reception of the video 
data, the buffer of the decoder is prevented from under- 
flow or overflow. However, since the information indicat- 
ing the VBV delay value is multiplexed in the bit stream 
itself, it is impossible to know the VBV delay value in 
advance of reception of the bit stream. 
[021 3] While in the first and second embodiments the 
data reception unit 106a (106b) outputs one frame of 
media data to the decoding unit 1 07a (107b) every time 
it receives one frame of media data, the construction of 
the data reception unit is not restricted thereto. 
[0214] For example, the data reception unit 106a 
(1 06b) may have a memory to hold the received media 
data, and it may read the media data from the memory 
to output it to the decoding unit 107a (107b) at the time 
when the clock time in the data reception apparatus 
reaches the display start time indicated by the begin at- 
tribute in the SMIL data. Alternatively, the data reception 
unit 106a (106b) may have a memory to hold the re- 
ceived media data, and it may read the media data from 
the memory to output it to the decoding unit 1 07a (1 07b) 
at the time when the clock time in the data reception 
apparatus reaches a time a predetermined period (e.g., 
one second) before the display start time indicated by 
the begin attribute in the SMIL data. 
[0215] In the above-described construction which 
starts decoding on the received media data at the dis- 
play start time indicated by the begin attribute, however, 
since decoding of the media data is started at the media 
data display start time, there is the possibility that the 
decoded image data is not stored in the frame memory 
by a predetermined time, depending on the perform- 
ance of the decoding unit. 

[021 6] While in the first and second embodiments the 
SMIL data is received as control data, the control data 
is not restricted to the SMIL data. For example, the con- 
trol data may be any of the following: XHTML (Extensi- 
ble Hyper Text Markup Language) defined by W3C, 
HTML (Hyper Text Markup Language) + TIME (Timed 
Interactive Multimedia Extensions), SDP (Session De- 
scription Protocol) defined by IETF (Internet Engineer- 
ing Task Force), and BIFS (Binary Format for Scene) 
defined by MPEG standard. 

[0217] Further, while in the first and second embodi- 
ments the data reception apparatus is implemented by 
hardware, it may be implemented by software. 
[0218] For example, in the data reception apparatus 
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110, the SMIL request/reception unit 102, the signal 
generation unit 1 04, the data request/reception unit 1 05, 
the media data reception units 106a and 106b, the de- 
coding units 107a and 107b, and the display unit 109 
can be implemented in a computer system using a soft- 
ware program in which the functions of these units are 
programmed so as to be performed by a CPU (Central 
Processing Unit). 

[0219] Even when the data reception apparatus 110 
ol the lirst embodiment is implemented by software, the 
same effects as described for the first embodiment are 
achieved. 

[0220] The above-described software program can 
be stored in storage media, such as a floppy disk, an 
optical disk, an IC card, a ROM cassette, and the like. 
[0221] In the first and second embodiments, a server 
(data transmission apparatus) corresponding to a re- 
ceiving terminal (client terminal) having a data reception 
apparatus which receives media data and control data 
such as SMIL data, transmits the SMIL data including 
information indicating the latency time before display of 
media data (prebuffering attribute) and information indi- 
cating the time to request media data from the server 
(request attribute), to the receiving terminal. However, 
the data transmission apparatus may transmits control 
data other than the SMIL data, including the information 
indicating the latency time and the information indicating 
the data request time. 

[0222] For example, a server (data transmission ap- 
paratus) corresponding to a receiving terminal which 
combines plural pieces of media data for display, trans- 
mits control data such as response data to the request 
from the receiving terminal, which control data includes 
the prebuffering attribute value, the request attribute val- 
ue, or an attribute value equivalent to them, before 
transmission of media data, and then transmits media 
data according to a data request message from the re- 
ceiving terminal. Also in this case, the receiving terminal 
can make a request for media data at an appropriate 
data request time. 

[0223] Hereinafter, a description will be given of an 
example of data exchange between a data reception ap- 
paratus included in a receiving terminal and a data 
transmission apparatus which transmits the information 
indicating the latency time, included in control data other 
than the SMIL data. 

[0224] Figure 9 is a diagram illustrating an example 
of data exchange between a data transmission appara- 
tus (server) for transmitting media data and a data re- 
ception apparatus, in the case where the data transmis- 
sion apparatus transmits the information indicating the 
prebuffering time (latency time) included in SDR 
[0225] In figure 9, each of a second server (data trans- 
mission apparatus) 23b transmitting the media data of 
the first foreground image and a third server (data trans- 
mission apparatus) 23c transmitting the media data of 
the second foreground image, transmits SDP including 
the prebuffering time (latency time). A first server (data 



transmission apparatus) 23a is identical in construction 
to the first server 13a shown in figure 14. A data recep- 
tion apparatus 130a is mounted on a personal computer 
as a client terminal, and is supplied with SMIL data Ds 
5 indicating the scene description SD shown in figure 12. 
The construction of the data reception apparatus 130a 
is identical to the conventional data reception apparatus 
901 (refer to figure 13) except the data request/recep- 
tion unit 908. That is, in the data reception apparatus 
130a, the data request/reception obtains the information 
indicating the prebuffering time (latency time), and sup- 
plies it to the control data generation unit 907. 
[0226] When the user, who is viewing a home page 
described by HTML (Hyper Text Markup Language) us- 
ing a Web browser installed on the personal computer, 
clicks a region on the home page linked to predeter- 
mined SMIL data, the data reception apparatus 130a of 
the client terminal issues an SMIL request command 
(GET http://sl.com/scene.smil) C1 requesting the SMIL 
data Ds. This command C1 requests the first server (si. 
com) 23a to distribute the SMIL data by HTTP. 
[0227] On receipt of the SMIL request command C1 , 
the server 23a issues an acknowledge (HTTP/1 .0 OK) 
R1 indicating that the command has been accepted, to 
the client terminal, and transmits the SMIL data (scene, 
sml) Ds to the client terminal. 

[0228] In the data reception apparatus 1 30a of the cli- 
ent terminal, the SMIL request/reception unit 906 re- 
ceives the SMIL data Ds, and analyzes the SMIL data 
Ds. 

[0229] The SMIL analysis data Da obtained by the 
analysis on the SMIL data is stored in the control data 
generation unit 907. 

[0230] Thereafter, the data reception apparatus 1 30a 
issues a command (DESCRIBE rtspy/s3.com/mov. 
mpg) C2a requesting specific Information relating to the 
media data corresponding to the second foreground im- 
age (mov) (e.g., coding condition, existence of plural 
candidate data, etc.), to the third server (s3.com) 23c. 
[0231 ] On receipt of the command C2a, the third serv- 
er 23c issues an acknowledge R20a indicating that the 
command has been accepted, to the client terminal. 
This acknowledge R20a includes an OK message (RT- 
SP/1.0 OK) 21a indicating that the DESCRIBE com- 
mand C2a has been accepted, and SDP (Session De- 
scription Protocol) information R22a. The SDP informa- 
tion R22a includes (a=prebuffering:15s) information, in 
addition to information required for decoding of media 
data at the receiving terminal and information required 
for transmission of media data. In the SDP information 
R22a, "V=0" indicates version information relating to the 
construction of the SDP, and tt m=video" indicates that 
information relating to video data is described after the 
B m=video B . The (a=prebuffering:15s) information indi- 
cates that the latency time from requesting the media 
data of the foreground image (mov) to displaying the 
foreground image is 15 seconds. 
[0232] Next, the data reception apparatus 1 30a of the 
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client terminal issues a setup request command (SET- 
UP rtsp://s3.com/ mov.mpg) C3a which requests the 
third server (s3.com) 23c to set up provision of the media 
data corresponding to the second foreground image 
(mov), to the third server 23c. Upon completion of setup 
for the media data, the third server 23c issues an ac- 
knowledge (RXSP/1 .0 OK) R3a indicating that the com- 
mand C3a has been accepted, to the client terminal. 
[0233] Subsequently, the data reception apparatus 
130a issues a command (DESCRIBE rtsp://s2.com/adv. 
mpg) C2b requesting specific information relating to the 
media data corresponding to the first foreground image 
(adv) (e.g., coding condition, existence of plural candi- 
date data, etc.), to the second server (s2.com) 23b. 
[0234] On receipt of the command C2b, the second 
server 23b issues an acknowledge R20b indicating that 
the command has been accepted, to the client terminal. 
This acknowledge R20b includes an OK message (RT- 
SP/1.0 OK) 21b indicating that the DESCRIBE com- 
mand C2a has been accepted, and SDP (Session De- 
scription Protocol) information R22b. The SDP informa- 
tion R22b includes (a=prebuffering:7s) information as 
well as (V=0) information and (m=video) information. 
The (a=prebuffering:7s) information indicates that the 
latency time from requesting the media data (adv.mpg) 
of the first foreground image (adv) to displaying the fore- 
ground image is 7 seconds. 

[0235] Next, the data reception apparatus 1 30a of the 
client terminal issues a setup request command (SET- 
UP rtsp://s2.com/ adv.mpg) C3b which requests the 
second server (s2.com) 23b to set up provision of the 
media data corresponding to the first foreground image 
(adv), to the second server 23b. Upon completion of set- 
up for the media data, the second server 23b issues an 
acknowledge (RTSP/1 .0 OK) R3b indicating that the 
command C3b has been accepted, to the client terminal. 
[0236] Thereafter, the data reception apparatus 1 30a 
of the client terminal issues a data request command 
(PLAY rtsp://s3.com/ mov.mpg) C4a requesting the me- 
dia data (mov.mpg) corresponding to the second fore- 
ground image (mov), to the third server (s3.com) 23c, 
fifteen seconds before the display start time of the sec- 
ond foreground image (five seconds before the display 
start time of the entire scene). On receipt of this com- 
mand C4a, the third server 23c issues an acknowledge 
(RTSP/1 .0 OK) R4a indicating that the command C4a 
has been accepted, to the client terminal. Thereafter, the 
third server 23c transmits the media data Dm2 corre- 
sponding to the second foreground image (mov.mpg), 
which media data is stored in RTP packets, to the client 
terminal. 

[0237] Further, the data reception apparatus 130a of 
the client terminal issues a data request command 
(PLAY rtsp://s2.com/ adv.mpg) C4b requesting the me- 
dia data (adv.mpg) corresponding to the first foreground 
image (adv), to the second server (s2.com) 23b, seven 
seconds before the display start time of the first fore- 
ground image (two seconds before the display start time 



of the entire scene). On receipt of this command C4b, 
the second server 23b issues an acknowledge (RTSP/ 
1 .0 OK) R4b indicating that the command C4b has been 
accepted, to the client terminal. Thereafter, the second 

5 server 23b transmits the media data Dm1 correspond- 
ing to the first foreground image (adv.mpg), which media 
data is stored in RTP packets, to the client terminal. 
[0238] Thereafter, the respective media data are out- 
put to the display unit 905 at the display start times to 

10 be displayed on the basis of the result of analysis per- 
formed on the SMIL data. 

[0239] The above-described method of transmitting 
the information which indicates the latency time (preb- 
uffering time) and is included in the control data (SDP 

15 data) other than SMIL data, from the data transmission 
apparatus to the data reception apparatus, is very effec- 
tive for contents whose initial delay time (i.e., latency 
time from requesting media data to the server to starting 
display of the media data) varies in real time (e.g., video 

20 of a concert which is broadcast live). 

[0240] Figure 1 0 is a diagram illustrating an example 
of data exchange between a data transmission appara- 
tus (server) for transmitting media data and a data re- 
ception apparatus, in the case where the data transmis- 

25 sion apparatus transmits the information indicating the 
prebuffering time (latency time), which information is in- 
cluded in an acknowledge to a SETUP request of RTSP. 
[0241] In figure 10, each of a second server (data 
transmission apparatus) 33b transmitting the media da- 

30 ta of the first foreground image and a third server (data 
transmission apparatus) 33c transmitting the media da- 
ta of the second foreground image, transmits the infor- 
mation indicating prebuffering time (latency time), in- 
cluded in an acknowledge to a SETUP request of RTSP. 

35 a first server (data transmission apparatus) 33a is iden- 
tical in construction to the first server 1 3a shown in figure 
14. A data reception apparatus 130b is mounted on a 
personal computer as a client terminal, and is supplied 
with SMIL data SD shown in figure 1 2 as scene descrip- 

40 tion data SD . The construction of the data reception ap- 
paratus 130b is identical to the conventional data recep- 
tion apparatus 901 (refer to figure 13) except the data 
request/reception unit 908. That is, in the data reception 
apparatus 130b, the data request/reception obtains the 

45 information indicating the prebuffering time (latency 
time), and supplies it to the control data generation unit 
907. 

[0242] When the user, who is viewing a home page 
described by HTML (Hyper Text Markup Language) us- 

50 jng a Web browser installed on the personal computer, 
clicks a region on the home page linked to predeter- 
mined SMIL data, the data reception apparatus 130b of 
the client terminal issues an SMIL request command 
(GET http:// sl.com/scene.smil) C1 requesting the SMIL 

55 data Ds. This command C1 requests the first server (si 
com) 33a to distribute the SMIL data by HTTP. 
[0243] On receipt of the SMiL request command C1 , 
the server 33a issues an acknowledge (HTTP/1 .0 OK) 
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R1 indicating that the command has been accepted, to 
the client terminal, and transmits the SMIL data (scene, 
sml) Ds to the client terminal. 

[0244] In the data reception apparatus 1 30b of the cli- 
ent terminal, the SMIL request/reception unit 906 re- 
ceives the SMIL data Ds, and analyzes the SMIL data 
Ds. 

[0245] The SMIL analysis data Da obtained by the 
analysis on the SMIL data is stored in the control data 
generation unit 907. 

[0246] Thereafter, the data reception apparatus 1 30b 
issues a command (DESCRIBE rtspy/s3.com/mov. 
mpg) C2a requesting specific information relating to the 
media data corresponding to the second foreground im- 
age (mov) (e.g., coding condition, existence of plural 
candidate data, etc.), to the third server (s3.com) 33c. 
[0247] On receipt of the command C2a, the third serv- 
er 33c issues an acknowledge R2a indicating that the 
command has been accepted, to the client terminal, and 
transmits SDP (Session Description Protocol) informa- 
tion to the client terminal. 

[0248] Next, the data reception apparatus 1 30b of the 
client terminal issues a setup request command (SET- 
UP rtsp://s3.com/ mov. mpg) C3a which requests the 
third server (s3.com) 33c to set up provision of the media 
data corresponding to the second foreground image 
(mov), to the third server 33c. Upon completion of setup 
for the media data, the third server 33c issues an ac- 
knowledge R30a indicating that the command C3a has 
been accepted, to the client terminal. 
[0249] This acknowledge R30a includes an OK mes- 
sage (RTSP/1.0 OK) 31a indicating that the SETUP 
command C3a has been accepted, and additional infor- 
mation 32a. The additional information 32a includes 
(a=prebuffering:15s) information, in addition to se- 
quence number (CSeq:2) information, session number 
(Session: 12345678) information, and the like. The 
(a=prebuffering:15s) information indicates that the la- 
tency time from requesting the media data of the fore- 
ground image (mov) to displaying the foreground image 
is 15 seconds. The sequence number (CSeq:2) is as- 
signed to onetime message exchange between the data 
transmission apparatus and the data reception appara- 
tus, and the same sequence number is assigned to an 
issue of a command from the receiving terminal and to 
an acknowledge to the command from the server. Ac- 
cordingly, although it is not shown in figure 10, a se- 
quence number (CSeq:1 ) is given to the command (DE- 
SCRIBE rtsp://s3.com/mov.mpg) C2a and to the ac- 
knowledge R2a. The session number is assigned to the 
state where data transmission is allowed, established 
between the data transmission apparatus and the data 
reception apparatus. 

[0250] Subsequently, the data reception apparatus 
130b issues a command (DESCRIBE rtsp://s2.com/adv. 
mpg) C2b requesting specific information relating to the 
media data corresponding to the first foreground image 
(adv) (e.g., coding condition, existence of plural candi- 



date data, etc.), to the second server (s2.com) 33b. 
[0251] On receipt of the command C2b, the second 
server 33b issues an acknowledge R2b indicating that 
the command has been accepted, to the client terminal, 
5 and transmits SDP (Session Description Protocol) infor- 
mation to the client terminal. 

[0252] Next, the data reception apparatus 1 30b of the 
client terminal issues a setup request command (SET- 
UP rtsp://s2.com/ adv.mpg) C3b which requests the 

10 second server (s2.com) 33b to set up provision of the 
media data corresponding to the first foreground image 
(adv), to the second server 33b. Upon completion of set- 
up for the media data, the second server 33b issues an 
acknowledge R30b indicating that the command C3b 

is has been accepted, to the client terminal. 

[0253] This acknowledge R30b includes an OK mes- 
sage (RTSP/1.0 OK) 31b Indicating that the SETUP 
command C3b has been accepted, and additional infor- 
mation 32b. The additional information 32b includes 

20 (a=prebuffering:7s) information, in addition to sequence 
number (CSeq:2) information, session number (Ses- 
sion: 12345688) information, and the like. The (a=preb- 
uffering:7s) information indicates that the latency time 
from requesting the media data of the foreground image 

25 (adv) to displaying the foreground image is 7 seconds. 
[0254] Thereafter, the data reception apparatus 130b 
of the client terminal issues a data request command 
(PLAY rtsp://s3.com/ mov.mpg) C4a requesting the me- 
dia data (mov.mpg) corresponding to the second fore- 

30 ground image (mov), to the third server (s3.com) 33c, 
fifteen seconds before the display start time of the sec- 
ond foreground image (five seconds before the display 
start time of the entire scene). On receipt of this com- 
mand C4a, the third server 33c issues an acknowledge 

35 (RTSP/1 .0 OK) R4a indicating that the command C4a 
has been accepted, to the client terminal. Thereafter, the 
third server 33c stores the media data Dm2 correspond- 
ing to the second foreground image (mov.mpg) in RTP 
packets, and transmits the media data, packet by pack- 

40 et, to the client terminal. 

[0255] Further, the data reception apparatus 130b of 
the client terminal issues a data request command 
(PLAY rtsp://s2.com/ adv.mpg) C4b requesting the me- 
dia data (adv.mpg) corresponding to the first foreground 

45 image (adv), to the second server (s2.com) 33b, seven 
seconds before the display start time of the first fore- 
ground image (two seconds before the display start time 
of the entire scene). On receipt of this command C4b, 
the second server 33b issues an acknowledge (RTSP/ 

50 1 .o OK) R4b indicating that the command C4b has been 
accepted, to the client terminal. Thereafter, the second 
server 33b stores the media data Dm1 corresponding 
to the first foreground image (adv.mpg) in RTP packets, 
and transmits the media data, packet by packet, to the 

55 client terminal. 

[0256] Thereafter, the respective media data are out- 
put to the display unit 905 at the display start times to 
be displayed on the basis of the result of analysis per- 
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formed on the SMIL data. 

[0257] The above-described method of transmitting 
the information which indicates the latency time (preb- 
uffering time) and is included in the control data (ac- 
knowledge of the server to the SETUP request from the 
receiving terminal) other than SMIL data, from the data 
transmission apparatus to the data reception apparatus, 
is very effective for contents whose initial delay time (i. 
e., latency time from requesting media data to the server 
to starting display of the media data) varies in real time 
(e.g., video of a concert which is broadcast live). 



Claims 

1 . A data reception apparatus for obtaining media data 
which is any of video data, audio data, and text data, 
and corresponds to plural elements constituting a 
scene, from data sources on a network, and playing 
the obtained media data to display the scene, said 
apparatus comprising: 

a first reception unit for receiving location infor- 
mation indicating the locations of the data 
sources having the respective media data on 
the network, first time information indicating the 
playback start times of the respective media da- 
ta, and second time information for requesting 
the respective media dataf rom the correspond- 
ing data source; 

a time setting unit for setting a data request time 
to make a request for each media data to the 
corresponding data source, at a time by a spe- 
cific time set for each media data earlier than 
the playback start time of the media data, on 
the basis of the first and second time informa- 
tion; 

a data request unit for making a request for 
each media data to the data source indicating 
by the location information, at the data request 
time set by the time setting unit; and 
a second reception unit for receiving the media 
data supplied from the data source according 
to the request from the data request unit. 

2. The data reception apparatus of Claim 1 : 

wherein said first reception unit receives, as the 
second time information, time information indi- 
cating a latency time from when each media da- 
ta is received to when the media data is played; 
and 

said time setting unit sets the data request time 
for each media data, at a time by the latency 
time earlier than the playback start time of the 
media data. 
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wherein said first reception unit receives, as the 
second time information, time information indi- 
cating a time to make a request for each media 
data to the corresponding data source; and 
said time setting unit sets the data request time 
for each media data, at the time indicated by 
the second time information. 

4. The data reception apparatus of Claim 1 : 

wherein said f irst reception unit receives, as the 
second time information, time information indi- 
cating a latency time from when each media da- 
ta is received to when the media data is played; 
and 

said time setting unit sets the data request time 
for each media data, at a time by the sum of the 
latency time and a predetermined time earlier 
than the playback start time of the media data. 

5. The data reception apparatus of Claim 1 : 

wherein said first reception unit receives, as the 
second time information, time information indi- 
cating a time to make a request for each media 
data to the corresponding data source; and 
said time setting unit sets the data request time 
for each media data, at a time by a predeter- 
mined time earlier than the time indicated by 
the second time information. 

6. A data reception method for obtaining media data 
which is any of video data, audio data, and text data, 
and corresponds to plural elements constituting a 
scene, from data sources on a network, and playing 
the obtained media data to display the scene, said 
method comprising: 

a first reception step of receiving location infor- 
mation indicating the locations of the data 
sources having the respective media data on 
the network, first time information indicating the 
playback start times of the respective media da- 
ta, and second time information for requesting 
the respective media data from the correspond- 
ing data sources; 

a data request step of making a request for 
each media data to the data source indicating 
by the location information, at a time by a spe- 
cific time set for each media data earlier than 
the playback start time of the media data, on 
the basis of the first and second time informa- 
tion; and 

a second reception step of receiving the media 
data supplied from the data source according 
to the request made in the data request step. 



3. The data reception apparatus of Claim 1 : 



7. The data reception method of Claim 6: 
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wherein said first reception step receives, as 
the second time information, time information 
indicating a latency time from when each media 
data is received to when the media data is 
played; and 

said data request step makes a request for 
each media data to a predetermined data 
source, at a time by the latency time earlier than 
the playback start time of the media data. 

8. The data reception method of Claim 6: 

wherein said first reception step receives, as 
the second time information, time information 
indicating a data request time to make a re- 
quest for each media data to the corresponding 
data source; and 

said data request step makes a request for 
each media data to the data source, at the data 
request time. 

9. The data reception method of Claim 6: 

wherein said first reception step receives, as 
the second time information, time information 
indicating a latency time from when each media 
data is received to when the media data is 
played; and 

said data request step makes a request for 
each media data to a predetermined data 
source, at a time by the sum of the latency time 
and a predetermined time earlier than the play- 
back start time of the media data. 

10. The data reception method of Claim 6: 

wherein said first reception step receives, as 
the second time information, time information 
indicating a data request time to make a re- 
quest for each media data to the corresponding 
data source; and 

said data request step makes a request for 
each media data to the data source, at a time 
by a predetermined time earlier than the data 
request time. 

11. A data transmission method for transmitting media 
data which is any of video data, audio data, and text 
data and corresponds to plural elements constitut- 
ing a scene, to a reception terminal for playing the 
media data to display the scene, said method com- 
prising: 

a first transmission step of transmitting location 
information indicating the locations of data 
sources having the respective media data on a 
network, first time information indicating the 
playbackstart times of the respective media da- 



ta, and second time information for requesting 
the respective media data from the correspond- 
ing data sources; and 

a second transmission step of transmitting the 
media data to the reception terminal, according 
to the request for the media data which is is- 
sued from the reception terminal on the basis 
of the first and second time information and the 
location information. 

12. The data transmission method of Claim 1 1 , wherein 
said second time information is time information in- 
dicating a latency time from when each media data 
is received to when the media data is played. 

1 3. The data transmission method of Claim 1 1 , wherein 
said second time information is time information in- 
dicating a data request time to make a request for 
each media data to the corresponding data source. 

14. A data storage medium containing a data playback 
program to make a computer perform a data play- 
back process of obtaining media data which is any 
of video data, audio data, and text data, and corre- 
sponds to plural elements constituting a scene, 
from data sources on a network, and playing the ob- 
tained media data to display the scene, said data 
playback program comprising: 

a first program to make the computer perform 
a first process of receiving location information 
indicating the locations of the data sources hav- 
ing the respective media data, first time infor- 
mation indicating the playback start times of the 
respective media data, and second time infor- 
mation for requesting the respective media da- 
ta from the corresponding data sources; 
a second program to make the computer per- 
form a second process of making a request for 
each media data to the data source indicating 
by the location information, at a time by a spe- 
cific time set for each media data earlier than 
the playback start time of the media data, on 
the basis of the first and second time informa- 
tion; and 

a third program to make the computer perform 
a third process of receiving the media data sup- 
plied from the data source according to the data 
request. 

15. A data storage medium which contains a data trans- 
mission program to make a computer perform a da- 
ta transmission process of transmitting media data 
which is any of video data, audio data, and text data 
and corresponds to plural elements constituting a 
scene, to a reception terminal for playing the media 
data to display the scene, said data transmission 
program comprising: 
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a first program to make the computer perform 
a first process of transmitting location informa- 
tion indicating the locations of data sources 
having the respective media data on a network, 
first time information indicating the playback s 
start times of the respective media data, and 
second time information for requesting the re- 
spective media data from the corresponding 
data sources; and 

a second program to make the computer per- 10 
form a second process of transmitting the me- 
dia data to the reception terminal, according to 
the request for the media data which is issued 
from the reception terminal on the basis of the 
first and second time information and the loca- *5 
tton information. 
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Fig.6 
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